Data & Open Technologies A Perfect Combination Introduction Lecture Καουκάκης Σταύρος Αναλυτής Προγραμματιστής Πληροφοριακών Συστημάτων, M.Sc. Μέλος Δ.Σ. Συλλόγου Αποφοίτων Μεταπτυχιακών Σπουδών Π.Κ. @kaukakis 11 June 2016
Contents Main Topics Data Sources (Who Produce Data?) Open Source & Free Software Some (future) stats We have Data. So, What We Need? Open (Source) Software Tools & Platforms Some Examples & Case Studies Discussion 2
Some Topics Who Produce Data? All of us. Everybody! Environment Who Owns Data? Why All Collect Data? When? All the Time! Take Advantage of Data! Who Collects Data? Government Companies Users 3 1st Data Driven World 11 June 2016
Μερικοί Αριθμοί Big Data Πάνω από το 90% του συνόλου των δεδομένων δημιουργήθηκαν τα τελευταία 2 χρόνια Κάθε 2 ημέρες αποθηκεύονται τόσα δεδομένα όσα υπήρχαν ψηφιακά μέχρι το 2003 Tο 2020 το μέγεθος των δεδομένων θα 10πλασιαστεί (~40 Zettabytes) Κάθε 1 έτος τα δεδομένα σχεδόν διπλασιάζονται Συσκευές σε σύνδεση με το διαδίκτυο: 13 δις Μέχρι το 2020 αναμένεται να φτάσουν στα 50 δις Πάνω από 3 δις χρήστες DVDs Stack to the Moon!!! (And Back) Ben Golub @golubbe 4
Every 60 Seconds! 2015 Report Source: qmee.com 5
Source: wikimedia.org 6
7
(Big Linked) Data & Software Software & Tools Needed Open Source Software Open Hardware Open Technologies Open Data Platforms 8
Customizability Flexibility Agility Interoperability Big Communities Freedom Try Before You Buy Low Cost Security Why Open Source? Online community and public directory of free and open source software https://www.openhub.net/ 9
Tools & Software for Data Storage Analysis Cleaning Mining Visualization Integration Publishing Automation Programming Languages Open Technology is everywhere! & 10
11
CKAN (Data Publishing) Publishing Sharing Using Data CKAN is a powerful data management system Web: ckan.org Case Study: http://www.data.gov.gr/ 12
Open Refine (Data Cleaning) A free, open source, powerful tool for working with messy data Cleaning Transforming from one format into another Extending Web: openrefine.org An Example 13
Datawrapper (Data Visualization Web App) Datawrapper is like having an amazing graphic designer at the tip of your fingers Brings Data to Life Interactive Charts No Coding Skills Needed Limitations for free edition (extraction in PNG files) Web: datawrapper.de Examples: https://datawrapper.de/gallery 14
Data-Driven Documents (for Programmers) D3.js is a JavaScript library for manipulating documents based on data Brings Data to Life Modern browsers Compatibility Data-driven approach Web: d3js.org Examples: github.com/d3/d3/wiki/gallery Have a look to Google Charts 15
Lumify (Analysis and visualization ) Lumify is an open source big data analysis and visualization platform Analyze relationships Geographical view Sharing your works in real time Web: lumify.io Examples: http://lumify.io/ 16
R Language - Environment R is a language and environment for statistical computing and graphics Statistical & Graphical techniques Linear and nonlinear modeling Classification, Clustering Web:.r-project.org Examples: http://www.rexamples.com/ 17
Data Storage - Management & More Open Hadoop (hadoop.apache.org) MongoDB (mongodb.com) Talend (talend.com) Rapidminer (rapidminer.com) Elodina Platform (elodina.net) RDMS, like MySql and PostgreSQL Online community and public directory of free and open source https://www.openhub.net/ https://opensource.org/ 18
Thank You, Questions? Καουκάκης Σταύρος stavroskaukakis@gmail.com @kaukakis 19