Προχωρημένα Θέματα σε Κατανεμημένα Συστήματα. Εισαγωγή

Σχετικά έγγραφα
Εισαγωγή στα Πληροφοριακά Συστήματα. Ενότητα 11: Αρχιτεκτονική Cloud

Πέτσιος Στέφανος Κων/νος Α.Μ. #47. Οι απαντήσεις του paper:

Πανεπιστήμιο Πειραιώς Τμήμα Πληροφορικής Πρόγραμμα Μεταπτυχιακών Σπουδών «Πληροφορική»

Εισαγωγή. Ρόλοι και τύποι cloud. Ορισμός και σύγκριση.

Αλίκη Λέσση. CNS&P Presales Engineer

ίκτυο προστασίας για τα Ελληνικά αγροτικά και οικόσιτα ζώα on.net e-foundatio // itute: toring Insti SAVE-Monit

Από τις Κοινότητες Πρακτικής στις Κοινότητες Μάθησης

Terabyte Technology Ltd

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Connected Threat Defense

Démographie spatiale/spatial Demography

CYTA Cloud Server Set Up Instructions

Connected Threat Defense

the total number of electrons passing through the lamp.

Approximation of distance between locations on earth given by latitude and longitude

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Test Data Management in Practice

Benchmarking Personal Cloud Storage

Bring Your Own Device (BYOD) Legal Challenges of the new Business Trend MINA ZOULOVITS LAWYER, PARNTER FILOTHEIDIS & PARTNERS LAW FIRM

Assalamu `alaikum wr. wb.

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

Block Ciphers Modes. Ramki Thurimella

The Simply Typed Lambda Calculus

Capacitors - Capacitance, Charge and Potential Difference

Instruction Execution Times

Εποχές( 1. Εποχή(του(mainframe((πολλοί( χρήστες,(ένας(υπολογιστής)(( 2. Εποχή(του(PC((ένας(χρήστης,(

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

Οδηγίες χρήσης. Registered. Οδηγίες ένταξης σήματος D-U-N-S Registered στην ιστοσελίδα σας και χρήσης του στην ηλεκτρονική σας επικοινωνία

University of Macedonia Master in Information Systems. Networking Technologies professors: A. Economides A. Pobortsis AGREEMENT AND ACCOUNTING

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

[1] P Q. Fig. 3.1

A browser-based digital signing solution over the web

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΙΑΤΜΗΜΑΤΙΚΟ ΜΕΤΑΠΤΥΧΙΑΚΟ ΠΡΟΓΡΑΜΜΑ ΣΠΟΥ ΩΝ «ΠΛΗΡΟΦΟΡΙΚΗ ΚΑΙ ΙΟΙΚΗΣΗ» ΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

«ΑΓΡΟΤΟΥΡΙΣΜΟΣ ΚΑΙ ΤΟΠΙΚΗ ΑΝΑΠΤΥΞΗ: Ο ΡΟΛΟΣ ΤΩΝ ΝΕΩΝ ΤΕΧΝΟΛΟΓΙΩΝ ΣΤΗΝ ΠΡΟΩΘΗΣΗ ΤΩΝ ΓΥΝΑΙΚΕΙΩΝ ΣΥΝΕΤΑΙΡΙΣΜΩΝ»

Οδηγίες χρήσης υλικού D U N S Registered

Code Breaker. TEACHER s NOTES

Προσομοίωση BP με το Bizagi Modeler

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

ΑΛΕΧΑΝΔΡΕΙΟ ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΘΕΣΣΑΛΟΝΙΚΗΣ ΣΧΟΛΗ ΟΙΚΟΝΟΜΙΑΣ ΚΑΙ ΔΙΟΙΚΗΣΗΣ ΤΜΗΜΑ ΜΑΡΚΕΤΙΓΚ ΑΛΕΧΑΝΔΡΕΙΟ ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ

Peer-to-Peer Technology

Newborn Upfront Payment & Newborn Supplement

ΠΑΝΔΠΙΣΗΜΙΟ ΜΑΚΔΓΟΝΙΑ ΠΡΟΓΡΑΜΜΑ ΜΔΣΑΠΣΤΥΙΑΚΧΝ ΠΟΤΓΧΝ ΣΜΗΜΑΣΟ ΔΦΑΡΜΟΜΔΝΗ ΠΛΗΡΟΦΟΡΙΚΗ

5.4 The Poisson Distribution.

department listing department name αχχουντσ ϕανε βαλικτ δδσϕηασδδη σδηφγ ασκϕηλκ τεχηνιχαλ αλαν ϕουν διξ τεχηνιχαλ ϕοην µαριανι

2nd Training Workshop of scientists- practitioners in the juvenile judicial system Volos, EVALUATION REPORT

Από την ιδέα στο έργο

Section 1: Listening and responding. Presenter: Niki Farfara MGTAV VCE Seminar 7 August 2016

How to register an account with the Hellenic Community of Sheffield.

Calculating the propagation delay of coaxial cable

Εγκατάσταση λογισμικού και αναβάθμιση συσκευής Device software installation and software upgrade

Study of In-vehicle Sound Field Creation by Simultaneous Equation Method

(C) 2010 Pearson Education, Inc. All rights reserved.

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

Galatia SIL Keyboard Information

The challenges of non-stable predicates

Μηχανισμοί πρόβλεψης προσήμων σε προσημασμένα μοντέλα κοινωνικών δικτύων ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

«Ευφυή Συστήματα Μεταφορών & εξελίξεις στην Ελλάδα»

Strain gauge and rosettes

VBA ΣΤΟ WORD. 1. Συχνά, όταν ήθελα να δώσω ένα φυλλάδιο εργασίας με ασκήσεις στους μαθητές έκανα το εξής: Version ΗΜΙΤΕΛΗΣ!!!!

Γιπλυμαηική Δπγαζία. «Ανθπυποκενηπικόρ ζσεδιαζμόρ γέθςπαρ πλοίος» Φοςζιάνηρ Αθανάζιορ. Δπιβλέπυν Καθηγηηήρ: Νηθφιανο Π. Βεληίθνο

ΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥ ΩΝ ΣΤΑ ΠΛΗΡΟΦΟΡΙΑΚΑ ΣΥΣΤΗΜΑΤΑ "VIDEO ΚΑΤΟΠΙΝ ΖΗΤΗΣΗΣ" ΑΝΝΑ ΜΟΣΧΑ Μ 11 / 99

EE512: Error Control Coding

1) Abstract (To be organized as: background, aim, workpackages, expected results) (300 words max) Το όριο λέξεων θα είναι ελαστικό.

Τέσσερις καλές πρακτικές για την ανάπτυξη λογισμικού στην Ανοιχτή Επιστήμη. Φώτης Ε. Ψωμόπουλος, Ερευνητής Γ ΙΝΕΒ ΕΚΕΤΑ

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

TaxiCounter Android App. Περδίκης Ανδρέας ME10069

SPEEDO AQUABEAT. Specially Designed for Aquatic Athletes and Active People

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

Υλοποίηση Δικτυακών Υποδομών και Υπηρεσιών: OSPF Cost

ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ ΤΟΜΕΑΣ ΥΔΡΑΥΛΙΚΗΣ ΚΑΙ ΠΕΡΙΒΑΛΛΟΝΤΙΚΗΣ ΤΕΧΝΙΚΗΣ. Ειδική διάλεξη 2: Εισαγωγή στον κώδικα της εργασίας

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΚΡΗΤΗΣ. Σχολή Τεχνολογικών Εφαρμογών Τμήμα Εφαρμοσμένης Πληροφορικής & Πολυμέσων

Special edition of the Technical Chamber of Greece on Video Conference Services on the Internet, 2000 NUTWBCAM

«ΨΥΧΙΚΗ ΥΓΕΙΑ ΚΑΙ ΣΕΞΟΥΑΛΙΚΗ» ΠΑΝΕΥΡΩΠΑΪΚΗ ΕΡΕΥΝΑ ΤΗΣ GAMIAN- EUROPE

Math 6 SL Probability Distributions Practice Test Mark Scheme

Cloud Computing & Data Management (Υπολογιστικά Νέφη & Διαχείριση Δεδομένων)

TMA4115 Matematikk 3

Liner Shipping Hub Network Design in a Competitive Environment

derivation of the Laplacian from rectangular to spherical coordinates

Internet protocol stack Encapsulation Connection oriented VS connectionless services Circuit Switching Packet Switching Store-and-forward switches

Statistical Inference I Locally most powerful tests

Πανεπιστήμιο Δυτικής Μακεδονίας. Τμήμα Μηχανικών Πληροφορικής & Τηλεπικοινωνιών. Ηλεκτρονική Υγεία

Στο εστιατόριο «ToDokimasesPrinToBgaleisStonKosmo?» έξω από τους δακτυλίους του Κρόνου, οι παραγγελίες γίνονται ηλεκτρονικά.

Πώς μπορεί κανείς να έχει έναν διερμηνέα κατά την επίσκεψή του στον Οικογενειακό του Γιατρό στο Ίσλινγκτον Getting an interpreter when you visit your

Προσωπική Aνάπτυξη. Ενότητα 4: Συνεργασία. Juan Carlos Martínez Director of Projects Development Department

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΠΕΛΟΠΟΝΝΗΣΟΥ

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ ΥΠΟΛΟΓΙΣΤΩΝ ΤΟΜΕΑΣ ΣΥΣΤΗΜΑΤΩΝ ΗΛΕΚΤΡΙΚΗΣ ΕΝΕΡΓΕΙΑΣ

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Δίκτυα Επικοινωνιών ΙΙ: OSPF Configuration

C.S. 430 Assignment 6, Sample Solutions

Business English. Ενότητα # 9: Financial Planning. Ευαγγελία Κουτσογιάννη Τμήμα Διοίκησης Επιχειρήσεων

Information and Communication Technologies in Education

About these lecture notes. Simply Typed λ-calculus. Types

DISTRIBUTED CACHE TABLE: EFFICIENT QUERY-DRIVEN PROCESSING OF MULTI-TERM QUERIES IN P2P NETWORKS

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Στρατηγικές Ασφάλειας

Transcript:

Προχωρημένα Θέματα σε Κατανεμημένα Συστήματα Εισαγωγή

What is a Distributed System? A Distributed System is a software system in which components located on networked computers communicate and coordinate their actions by passing messages. (Wikipedia) You know you have a Distributed System when the crash of a computer you have never heard of stops you from getting any work done! (Leslie Lamport) 2

Brainteaser! What if you had to build a system for Storing Facebook s 250.000.000.000 photos? (TheVerge.com Sep 2013) Adding 350.000.000 photos per day? (BusinessInsider.com Sep 2013) Each of them in 4 different sizes How would you Manage Amazon s infrastructure? Compute Google s scores (page ranks), answering queries, calculating statistics on the Web graph, etc.? Hint keywords: decentralization large-scale systems massively distributed approaches 3

Large-scale companies clusters How many servers at Google? shhhhh it s a secret! Estimation runs in the range: 2006: 200K 500K $2M electricity bill per month 2012: 1.5M 2013: 2.3M 2015:?!?!?! Google s first cluster at Stanford Home-made large-scale distributed systems no centralization tailored to the company needs Similar approaches at Facebook, Yahoo!, Amazon, Microsoft, etc. Modern Data Center 4

Large-scale companies clusters SCAAAALE!!! What are the technologies behind systems of such scale?? 5

Grid computing Analogy to the electrical power grid One plugs, gets power, pays for what one gets but does not need to know where the power is coming from One can sell back some power to the network Companies/labs aggregate their computing resources (data/computing clusters) Usage on demand Used for computationally intensive tasks CERN collider experiments analysis simulations: economics, weather forecast, biology, physics, etc. 6

Cloud computing Amazon, OVH, GreenQlouds, RackSpace, etc., provide many nodes for rent! Rent servers and storage space / services No need to know where the code is running Payment based on usage Often based on virtualization (shared servers) Applications running on the cloud are not necessarily largescale distributed applications But the cloud services typically are E.g., Amazon S3 (Simple Storage Service) 7

Peer-to-peer networks Run on normal users machines Low resources Low trust Low reliability High heterogeneity Not dedicated Typical applications file sharing (BitTorrent, emule, etc.) IP telephony (Skype: 40M users!) Network storage (OceanStore, PAST) Anonymizers (TOR) a view of the Gnutella file sharing network 8

Power in Unity! Using many small nodes can be more effective than relying on expensive highend servers 9

Massive Distributed Applications Challenges Increased scale Geographic spread Dynamic membership Node churn Failures Prob * at least one machine fails at a given time + 100% Multiple administrative domains Impact on system design Centralized control infeasible! Administration costs increase! Imposing explicit control more complex! Need to consider failures as the norm 10

Large-scale Decentralized Systems Self-configurable low administration High-level commands, i.e. join cluster A, join application B Plug-and-play Self-organized Highly adaptable to changes Flexible in dynamic environments Robust, Resilient to massive failures Load balancing High Availability Self-healing High scalability, and elasticity SIMPLICITY! 11

Course Layout P2P Structured / Unstructured P2P Overlays DHTs Epidemic Protocols Aggregation Slicing Cloud Computing MapReduce / Hadoop Distributed File Systems (Google File System, Hbase) BigTable Stream Processing Graph Databases 12

Προτωρημένα Θέματα σε Κατανεμημένα Σσστήματα Peer-to-Peer Systems

Peer-to-Peer Systems 14

Historical perspective 1970s - 1980s: Birth of the Internet Limited reach of the Internet Email, FTP, Telnet Share documents and resources between research centers Central committee to organize and maintain it 1990s Tremendous expansion & diffusion Killer apps: WWW and e-commerce Client/Server model Late 1990s - today P2P: An alternative to Client/Server Passive clients active peers End-computers play a role, contribute, interact 15

How it all started June 1999 Napster is born / 1 st generation of P2P Users not only download content but also provide content to others Users establish a virtual network, entirely independent of physical network and administrative authorities or restrictions Basis: UDP and TCP connections between the peers December 1999: RIAA files a lawsuit against Napster Inc. TARGET: the central lookup server of Napster ACHIEVEMENT: Napster popularity skyrocketed! February 2001: Peak operation 26.4M users 2.79 billion files / month July 2001: Judge orders Napster to pull the plug! Napster network breaks down instantly 16

How it continued March 2000 Nullsoft releases Gnutella as an open source project Fully decentralized Additionally to offering files, the peers also take over routing tasks No central lookup server no single point to attack Later in 2000: Superpeer concept Hierarchical routing layer Significantly improves scalability and efficiency FastTrack (Morpheus, KaZaA) edonkey2000 2001-2002 KaZaA loses ground (many defected files due to weak hash keys to identify files) edonkey and Gnutella regain popularity edonkey becomes most popular file-sharing network: 2-3M online users Gnutella v0.6 adopts superpeer architecture (ultrapeers in Gnutella terminology) 17

P2P Traffic in 2001 18

How it took off! 2002 First version of BitTorrent released 2003 BitTorrent causes majority of the observed traffic Downloads significantly faster, due to mechanism against free-riding Middle of 2003 New P2P concepts develop Skype is born: a P2P Voice-over-IP application In the meantime: More P2P domains explored! P2P Routing Network Storage P2P Multicasting Data aggregation P2P Streaming etc. Today: Major efforts are made to increase the reliability of P2P systems, to use P2P also in mobile networks, etc. 19

Internet Traffic Web P2P FTP 20

Internet Traffic 21

Προτωρημένα Θέματα σε Κατανεμημένα Σσστήματα Peer-to-Peer Definition

A simple definition Endpoints talk directly to each other, as opposed to client/server E-mail IP Routing Telephones! NAPSTER: Based on a centralized server!!! 23

What makes P2P interesting? End-nodes are promoted to active components! previously they were just clients Nodes participate, interact, contribute to the services they use. Harness huge pools of resources accumulated in millions of end-nodes. 24

Is application XYZ P2P? Do nodes contribute to the system? Do nodes collectively carry out a service? Are variable connectivity and temporary network addresses the norm? Do nodes have significant autonomy? Can they (generally) be heterogeneous? Who owns the hardware? Single-administered entity? Distributed among participating users? 25

A better definition P2P is a class of systems where: Resources available at the edges of the Internet are utilized: Storage CPU cycles Bandwidth Content Human presence Service is carried out collectively Nodes share both benefits and duties Irregularities and dynamicity are treated as the norm 26

Dual nature: Client & Server Workload client server 27

Main advantages of P2P Inherently scalable: higher demand higher contribution! Increased (massive) aggregate capacity Utilize otherwise wasted resources Distribute load and administration Designed to be fault tolerant Inherently handle dynamic conditions 28

Προτωρημένα Θέματα σε Κατανεμημένα Σσστήματα Peer-to-Peer Issues

Overlay Networks B Focus on the application layer A C Overlay Network Physical Network 30

Overlay types Unstructured P2P Structured P2P Any two nodes can establish a link Topology evolves at random Topology strictly determined by node IDs Topology reflects desired properties of linked nodes 31

Overlay types Centralized P2P Unstructured P2P Pure P2P Hybrid P2P Structured P2P DHT-Based Central entity necessary to manage the overlay Central entity is some kind of index/group database Example: Napster No central entities Any node can be removed without loss of functionality Example: Gnutella v0.4, Freenet Multiple & Dynamic central entities Any node can be removed without loss of functionality Example: Gnutella v0.6, Freenet No central entities Fixed links, determined by node IDs Any node can be removed without loss of functionality Examples: Chord, Pastry, CAN 32

Main Issues in P2P Overlay Maintenance Bootstrapping how to join the system Continuous maintenance how to handle changes, faults, etc. 33

Main Issues in P2P Scalability Avoid central server! Distribute load on multiple peers Limit load per peer Computing Messaging Storage State 34

Main Issues in P2P Fairness Load balancing Distribute load among peers, but how? Evenly? Proportionally to node capacity?? User behavior! users are selfish and independent (maximize own benefit) give incentives for fair play to maximize benefit abide by the rules! 35

Main Issues in P2P Dynamicity and Adaptability Changing topology nodes join and leave: node churn network partitions Changing data content is changed files are added / deleted Changing profiles users change interests new semantic categories introduced Change in load load rebalancing 36

Main Issues in P2P Fault Tolerance Robustness of the overlay Resilience to failures Resistance to node & link crashes Availability 37

Main Issues in P2P Self-Organization Key for Overlay maintenance Adaptability Fault Tolerance Robustness No one keeps full state: nodes take local decisions Globally smooth operation should emerge from local decisions!! Self-Management Self-Healing repair problems encountered Self-Configuration Self-* 38

Main Issues in P2P Performance Efficiency in searching in routing steps in discovering relationships etc. Locality reduce network latency 39

Main Issues in P2P Privacy Anonymity who downloaded a copyrighted movie? who wrote the bad review about Spyros course? Reputation Resistance to censorship 40

Main Issues in P2P Security Defend against DDOS attacks Disseminate worm protection patches: Speed is crucial! Make P2P systems themselves secure 41

Main Issues in P2P Legal issues Copyright violation Direct infringement e.g., download or upload copyrighted files Indirect infringement e.g., someone offers the means for direct infringement 42

Main Issues in P2P SIMPLICITY! Things can easily get out of control with thousands of nodes under dynamic conditions! 43

Προτωρημένα Θέματα σε Κατανεμημένα Σσστήματα Peer-to-Peer Application Areas

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Storage Network Storage Caching Replication Bandwidth Content Distribution Internet/Intranet Collaborative download Distributed Computing Edge Services Grid Computing VoIP 45

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Internet/Intranet Distributed Computing Grid Computing Storage Network Storage Caching Replication Bandwidth Content Distribution Distributed Storage Edge Services VoIP 46

Sharing Content Blue Killer deployments Napster Gnutella KaZaA/FastTrack edonkey2000 BitTorrent NPR Star Wars ER Hey Jude C Large distributed storage Very high variation of content Magic Flute D Unstable availability No guarantees 47

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Internet/Intranet Distributed Computing Grid Computing Storage Network Storage Caching Replication Bandwidth Content Distribution Distributed Storage Edge Services VoIP 48

Network Storage OceanStore PAST K54 N56 N1 Store(K54) N8 N51 N48 N14 N42 N21 N38 N32 49

Network Storage OceanStore PAST K54 N56 N1 Store(K54) N8 N51 N48 K54 N14 N42 N21 N38 N32 50

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Internet/Intranet Distributed Computing Grid Computing Storage Network Storage Caching Replication Bandwidth Content Distribution Distributed Storage Edge Services VoIP 51

Contributing Bandwidth CDNs (Content Distribution Networks) BitTorrent File-sharing systems 52

Contributing Bandwidth 1. 9h:52m 2. 14h:48m Source server: 100 Mb/s Clients: 10 Mb/s 1. Antivirus update 100,000 clients File: 4 MB 2. Daily database update 1000 clients File: 600 MB 1. 52s 2. 09m:54s Client/Server Cooperative 53

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Internet/Intranet Distributed Computing Grid Computing Storage Network Storage Caching Replication Bandwidth Content Distribution Distributed Storage Edge Services VoIP 54

Sharing CPU Increasing requirements for High Performance Computing i.e., in the field of bio-informatics, logistics or the financial sector Available computing power of endpoints often unused Use P2P to bundle processor cycles: Forming a cluster of independent, networked computers that are combined into a single logical computer Achieve computing power which even the most expensive super-computers can scarcely provide Grid Computing 55

Sharing CPU --- Examples Popular example: SETI@home Calculations during the idle processor cycles of participating peers. Successors: BOINC (Berkeley): http://boinc.berkeley.edu/ World Community Grid (IBM) : http://www.worldcommunitygrid.org Biology and Medicine Climate simulations Math Astronomy, Physics, Chemistry NOTE: The core of these systems is a classical Client/Server application Advanced vision of grid computing: Globus Toolkit Standardized middleware for grid application. 56

P2P Application Areas High-level grouping of P2P apps based on shared resource Collaboration Instant Messaging Shared whiteboard Co-review/edit/author Gaming CPU Content File sharing Information Mgmt Discover Aggregate Filter Internet/Intranet Distributed Computing Grid Computing Storage Network Storage Caching Replication Bandwidth Content Distribution Distributed Storage Edge Services VoIP 57

Presence Information Presence Information information about which peers and which resources are available Example: Instant Messaging Systems P2P application which essentially uses presence information Peers pass on information via the network, whether or not they are available for communication 58

Document Collaboration Usually centrally organized But In many cases, documents distributed across desktop PCs no central repository having any knowledge of their existence Solution P2P networks which create a connected repository from the local data on the individual peers. Indexing and categorization of data by each peer on the basis of individually selected criteria. Self organized aggregation of information from areas of knowledge. http://www.nextpage.com/ 59

Collaboration Collaboration synchronous communication online meetings edit shared documents. Groupware offers functions like IM, file sharing, notification, co-browsing, whiteboards, voice conferences and databases with real time synchronization. Client/Server groupware has to be set up and administered for each working group P2P Groupware avoid additional administrative task and central data management: All of the data created is stored on each peer and is synchronized automatically. Users can set up shared working environment for virtual teams (so-called shared spaces). Users can invite other users to work in these teams. http://www.groove.net/ 60

Προτωρημένα Θέματα σε Κατανεμημένα Σσστήματα Basics in file-sharing

Napster: Centralized P2P Peer-to-peer relies on a central index but files don t reside on a central server Four steps: Connect to Napster server Upload your list of files (push) to server Give server keywords to search the full list Select best of correct answers (based on pings) 62

Napster: Clever Design Centralized user and song database Quick searching Faster/better than Gnutella Users come and go User/search database continually updated Automatic file sharing Easy to use file server But Single server to bring down This centralization is ultimately its downfall 63

Gnutella: Pure P2P Focus: decentralized method of searching harder to pull the plug Search by flooding If you don t have the file you want, query 7 of your partners (neighbors) If they don t have it, they contact 7 of their neighbors, for a maximum hop count of 10 Requests are flooded may lead to scalability problems No looping but packets may be received twice Transfer Query Response Querying node is sent responses with list of matching files and IP addresses File transfer is direct (no anonymity) 65

Gnutella: Overlay Maintenance Plug-in to a host and send a broadcast ping Can be any host (hosts transmitted through word-of-mouth or host-caches) Overloaded Node Replying Node Host broadcasts ping message with TTL of 7 Hosts that are not overloaded respond with a routed pong Gnutella caches IP addresses of replying nodes Pong Ping New Host 66

Gnutella: Problems 24 hour survey showed: 70% of people shared no files 50% of search responses from top 1% of hosts Reverting to client/server Suddenly not so hard to shut down! Verified hypotheses H1: A significant portion of Gnutella peers are free riders H2: Free riders are distributed evenly across domains H3: Often hosts share files nobody is interested in Non-standard implementation People implement their own Gnutella clients Some clients are dodgier than others 68

KaZaA: Hybrid P2P Software Proprietary Files and control data encrypted Everything in HTTP request and response messages Architecture Hierarchical Cross between Napster and Gnutella 70

KaZaA: Architecture Each peer is either a supernode or is assigned to a supernode Nodes with more bandwidth and that are more available are designated as supernodes Each supernode knows about many other supernodes (almost mesh overlay) Supernodes act as mini-napster hubs tracking the content and IP addresses of their descendants Guess: ~10,000 supernodes with 200-500 descendants each Dedicated user authentication server and supernode list server Supernodes 71

KaZaA: Queries Node first sends query to supernode Supernode responds with matches If x matches found, done Otherwise, supernode forwards query to subset of supernodes If total of x matches found, done Supernodes Otherwise, query further forwarded Probably by original supernode rather than recursively 72

KaZaA: Overlay Maintenance List of potential supernodes included within software download New peer goes through list until it finds operational supernode Connects, obtains more up-to-date list Node then pings 5 nodes on list and connects with the one with smallest RTT If supernode goes down, node obtains updated list and chooses new supernode 73