ΕΠΛ372 Παράλληλη Επεξεργασία

Σχετικά έγγραφα

ΜΥΕ- 05 Αρχιτεκτονική Υπολογιστών 2

ΕΠΛ605 Προχωρημένη Αρχιτεκτονική Υπολογιστών ιστοσελίδα θα ανακοινωθεί

Προχωρηµένα Θέµατα Αρχιτεκτονικής

Κεφάλαιο 1 Αφαιρετικότητα και Τεχνολογία Υπολογιστών (Computer Abstractions and Technology)

Αρχιτεκτονική Υπολογιστών

Εισαγωγή στους Ηλεκτρονικούς Υπολογιστές

the total number of electrons passing through the lamp.

Αρχιτεκτονική Υπολογιστών

[1] P Q. Fig. 3.1

2R2. 2 (L W H) [mm] Wire Wound SMD Power Inductor. Nominal Inductance Packing Tape & Reel. Design Code M ±20%

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Αρχιτεκτονική Υπολογιστών

Πολυπύρηνοι επεξεργαστές Multicore processors

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Εισαγωγή. Εθνικό Μετσόβιο Πολυτεχνείο Σχολή Ηλεκτρολόγων Μηχ. και Μηχανικών Υπολογιστών Εργαστήριο Υπολογιστικών Συστημάτων

Capacitors - Capacitance, Charge and Potential Difference

EM 361: Παράλληλοι Υπολογισμοί

Αρχιτεκτονική Υπολογιστών

CMOS Technology for Computer Architects

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

TIME SWITCHES AND TWILIGHT SWITCHES

Shenzhen Lys Technology Co., Ltd

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

ΕΠΛ221: Οργάνωση Υπολογιστών. Γιάννος Σαζεϊδης

Αρχιτεκτονική Υπολογιστών

Κεφ. 1: Μετρικά Σύγκρισης Επίδοσης και Χρονοπρογράμματα (Benchmarking), και Άλλα Μετρικά Κεφ. 1

Sunlord Specifications subject to change without notice. Please check our website for latest information. Revised 2018/04/15

ΥΠΟΛΟΓΙΣΤΙΚΗ ΧΗΜΕΙΑ ΜΕ ΕΦΑΡΜΟΓΕΣ ΣΕ ΜΟΡΙΑ, ΥΛΙΚΑ, ΠΕΡΙΒΑΛΛΟΝ

ΕΠΛ 605 Προχωρηµένη Αρχιτεκτονική Υπολογιστών. Pedro Trancoso. CASPER group Department of Computer Science University of Cyprus, Cyprus.

GPGPU. Grover. On Large Scale Simulation of Grover s Algorithm by Using GPGPU

ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ ΤΟΜΕΑΣ ΥΔΡΑΥΛΙΚΗΣ ΚΑΙ ΠΕΡΙΒΑΛΛΟΝΤΙΚΗΣ ΤΕΧΝΙΚΗΣ. Ειδική διάλεξη 2: Εισαγωγή στον κώδικα της εργασίας

Από την ιδέα στο έργο

Instruction Execution Times

ΤΕΧΝΙΚΕΣ ΑΥΞΗΣΗΣ ΤΗΣ ΑΠΟΔΟΣΗΣ ΤΩΝ ΥΠΟΛΟΓΙΣΤΩΝ I

Υ- 01 Αρχιτεκτονική Υπολογιστών Πολυεπεξεργαστές

PRODUCT IDENTIFICATION SWPA 3012 S 1R0 N T

ΕΝΣΩΜΑΤΩΜΕΝΑ ΣΥΣΤΗΜΑΤΑ ΤΕΙ ΗΠΕΙΡΟΥ- ΣΤΕΦ ΤΜΗΜΑ ΜΗΧ. ΠΛΗΡΟΦΟΡΙΚΗΣ Τ.Ε.

The Simply Typed Lambda Calculus

Numerical Analysis FMN011

Context-aware και mhealth

Assalamu `alaikum wr. wb.

ΥΠΟΛΟΓΙΣΤΙΚΗ ΧΗΜΕΙΑ ΜΕ ΕΦΑΡΜΟΓΕΣ ΣΕ ΜΟΡΙΑ, ΥΛΙΚΑ, ΠΕΡΙΒΑΛΛΟΝ. Ι ΑΣΚΟΝΤΕΣ: Μαρία Κανακίδου, Σταύρος Φαράντος, Γιώργος Φρουδάκης

Περιεχόμενο μαθήματος

Surface Mount Multilayer Chip Capacitors for Commodity Solutions

Μεταπτυχιακή διατριβή. Ανδρέας Παπαευσταθίου

Κεφάλαιο 1ο Πολυπρογραμματισμός Πολυδιεργασία Κατηγορίες Λειτουργικών Συστημάτων

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Information Technology for Business

Παράλληλα Συστήματα. Γιώργος Δημητρίου. Ενότητα 3 η : Παράλληλη Επεξεργασία. Πανεπιστήμιο Θεσσαλίας - Τμήμα Πληροφορικής

HCI - Human Computer Interaction Σχεδιασμός Διεπαφής. ΓΤΠ 61 Βαµβακάρης Μιχάλης 09/12/07

ΠΕΡΙΕΧΟΜΕΝΑ. Κεφάλαιο 1: Κεφάλαιο 2: Κεφάλαιο 3:

ΑΥΤΟΜΑΤΟΠΟΙΗΣΗ ΜΟΝΑΔΑΣ ΘΡΑΥΣΤΗΡΑ ΜΕ ΧΡΗΣΗ P.L.C. AUTOMATION OF A CRUSHER MODULE USING P.L.C.

4 th SE European CODE Workshop 10 th 11 th of March 2011, Thessaloniki, Greece

Συστήματα Παράλληλης & Κατανεμημένης Επεξεργασίας

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Εισαγωγή. Εθνικό Μετσόβιο Πολυτεχνείο Σχολή Ηλεκτρολόγων Μηχ. και Μηχανικών Υπολογιστών Εργαστήριο Υπολογιστικών Συστημάτων

Εισαγωγή στα Πληροφοριακά Συστήματα. Ενότητα 11: Αρχιτεκτονική Cloud

ΑΝΙΧΝΕΥΣΗ ΓΕΓΟΝΟΤΩΝ ΒΗΜΑΤΙΣΜΟΥ ΜΕ ΧΡΗΣΗ ΕΠΙΤΑΧΥΝΣΙΟΜΕΤΡΩΝ ΔΙΠΛΩΜΑΤΙΚΗ ΕΡΓΑΣΙΑ

Calculating the propagation delay of coaxial cable

Sunlord. Wire Wound SMD Power Inductors SPH Series Operating Temp. : -40 ~+125 (Including self-heating) 2R2 SPH

Series AM2DZ 2 Watt DC-DC Converter

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ. «Θεσμικό Πλαίσιο Φωτοβολταïκών Συστημάτων- Βέλτιστη Απόδοση Μέσω Τρόπων Στήριξης»

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Bulletin 1489 UL489 Circuit Breakers

C121. External Dimensions (L W) (mm) 0603 [0201] [0402] [0603] [0805]

Εισαγωγή. Εθνικό Μετσόβιο Πολυτεχνείο Σχολή Ηλεκτρολόγων Μηχ. και Μηχανικών Υπολογιστών Εργαστήριο Υπολογιστικών Συστημάτων

Εγκατάσταση λογισμικού και αναβάθμιση συσκευής Device software installation and software upgrade

TMA4115 Matematikk 3

Αναερόβια Φυσική Κατάσταση

ΠΑΡΑΛΛΗΛΗ ΕΠΕΞΕΡΓΑΣΙΑ

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΜΗΧΑΝΙΚΗΣ ΚΑΙ ΤΕΧΝΟΛΟΓΙΑΣ

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

Liner Shipping Hub Network Design in a Competitive Environment

CSR series. Thick Film Chip Resistor Current Sensing Type FEATURE PART NUMBERING SYSTEM ELECTRICAL CHARACTERISTICS

Αρχιτεκτονική Υπολογιστών Ι (ένα)

Παράλληλοι υπολογιστές

FX10 SIMD SIMD. [3] Dekker [4] IEEE754. a.lo. (SpMV Sparse matrix and vector product) IEEE754 IEEE754 [5] Double-Double Knuth FMA FMA FX10 FMA SIMD

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

ΕΛΕΓΧΟΣ ΚΑΙ ΤΡΟΦΟΔΟΤΗΣΗ ΜΕΛΙΣΣΟΚΟΜΕΙΟΥ ΑΠΟ ΑΠΟΣΤΑΣΗ

DESKTOP - Intel processor reference chart

Metal thin film chip resistor networks

Transient Voltage Suppression Diodes: 1.5KE Series Axial Leaded Type 1500 W

ΑρχιτεκτονικήΥπολογιστών. Ι (ένα) Δημήτρης Γκιζόπουλος. Καθηγητής

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ ΣΧΟΛΗ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

ΕΙΣΑΓΩΓΗ στους Η/Υ. Δρ. Β Σγαρδώνη. Τμήμα Τεχνολογίας Αεροσκαφών ΤΕΙ ΣΤΕΡΕΑΣ ΕΛΛΑΔΑΣ. Χειμερινό Εξάμηνο

Simulink The MathWorks, Inc.

Electrical Specifications at T AMB =25 C DC VOLTS (V) MAXIMUM POWER (dbm) DYNAMIC RANGE IP3 (dbm) (db) Output (1 db Comp.) at 2 f U. Typ.

Κατανεμημένος και Παράλληλος Προγραμματισμός. Ύλη μαθήματος. Βιβλιογραφία Μαθήματος 2/27/2016

Matrices and Determinants

ΠΟΛΥΤΕΧΝΕΙΟ ΚΡΗΤΗΣ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ ΠΕΡΙΒΑΛΛΟΝΤΟΣ

MULTILAYER CHIP VARISTOR JMV S & E Series: (SMD Surge Protection)

Chapter 4 (1) Αξιολόγηση και κατανόηση της απόδοσης

Technical Report. General Design Data of a Three Phase Induction Machine 90kW Squirrel Cage Rotor

NPI Unshielded Power Inductors

Η διασύνδεση Υλικού και λογισμικού David A. Patterson και John L. Hennessy. Αφηρημένες έννοιες και τεχνολογία υπολογιστών

ΚΒΑΝΤΙΚΟΙ ΥΠΟΛΟΓΙΣΤΕΣ

Προσομοίωση BP με το Bizagi Modeler

ΙΑΤΜΗΜΑΤΙΚΟ ΠΡΟΓΡΑΜΜΑ ΜΕΤΑΠΤΥΧΙΑΚΩΝ ΣΠΟΥ ΩΝ ΣΤΑ ΠΛΗΡΟΦΟΡΙΑΚΑ ΣΥΣΤΗΜΑΤΑ "VIDEO ΚΑΤΟΠΙΝ ΖΗΤΗΣΗΣ" ΑΝΝΑ ΜΟΣΧΑ Μ 11 / 99

Κεφάλαιο 4 Εκτίμηση και Κατανόηση Απόδοσης

Metal Oxide Varistors (MOV) Data Sheet

Transcript:

ΕΠΛ372 Παράλληλη Επεξεργασία Εισαγωγή: Τάσεις Τεχνολογίας και Μετακίνηση στον Παραλληλισμό Γιάννος Σαζεϊδης Εαρινό Εξάμηνο 2014 READING www.elsevierdirect.com/samplechapters/9780123838728/hennessy_chapter%201.pdf Sections 1.1,1.2,1.4,1.5,1.6,1.8,1.9,1.10 http://www.youtube.com/watch?v=cnn_ttxabua How a CPU is made http://www.cs.utexas.edu/~lin/cs380p/free_lunch.pdf The free lunch is over Slides based on Elsevier Material Copyright 2012, Elsevier Inc. All rights reserved.

Σειριακή Εκτέλεση Εκτέλεση μία μία τις εντολές Drawing from tut at LLNL

Παράλληλη Εκτέλεση Multiple instructions at a time(as many as CPUs) Αύξηση της επίδοσης (P φορές= # of CPUs) Drawing from tut at LLNL

Παράλληλη Εκτέλεση + Finish a problem faster and better + Easier to build parallel than faster + Save money + Solve bigger problems + Concurrency (internet, google, facebook) - Many services independent requests - Not always obvious how to parallelize - Dependences/Communication limit parallelism - System size determines how much communication can afford - Writing parallel program can be tricky

372 Introduction Προσπαθεί να εκπαιδεύσει και να απαντήσει για θέματα σε σχέση με: Γιατί ο παραλληλισμός είναι πιο κρίσιμος σήμερα; Παράλληλες Μηχανές/Υπολογιστές Πώς να γράψετε παράλληλους αλγόριθμους/ προγράμματα Παράλληλες Γλώσσες Εκτιμήσετε τις δυσκολίες

Computer Technology (Τεχνολογία Υπολογιστών) Introduction Performance improvements: semiconductor technology Feature size (moore s law), clock speed computer architectures ilp, caching, microarchitecture System Software High Level Languages (HLL), C, C++, Java Compilers, gcc, g++ Operating system (OS), unix, linux, ios, windows

Single Processor Performance Move to multi-processor Introduction RISC

Bandwidth and Latency Bandwidth or throughput (Εύρος ή ιεκπεραιωτική Ικανότητα) Total work done in a given time 10,000-25,000X improvement for processors 300-1200X improvement for memory and disks Latency or response time (Χρόνος Ανταπόκρισης) Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks Trends in Technology

Bandwidth and Latency Trends in Technology Log-log plot of bandwidth and latency milestones Historic trend: bandwidth grows more than latency

Trends in Technology Integrated circuit technology Transistor density: 35%/year Die size: 10-20%/year Integration overall: 40-55%/year Trends in Technology DRAM capacity: 25-40%/year (slowing) Flash capacity: 50-60%/year 15-20X cheaper/bit than DRAM Magnetic disk technology: 40%/year 15-25X cheaper/bit then Flash 300-500X cheaper/bit than DRAM

Reason 1: Non uniform scaling of Transistors and Wires Feature size Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to.014 microns (14nm) in 2014 Transistor performance scales linearly Wire delay does not improve with feature size! Integration density scales quadratically Non-uniform scaling supports localized activity and communication Fast compute elements, slow communication Trends in Technology

Reason 2. Can not increase Power Density Cheaply Power: ρυθμός κατανάλωσης ενέργειας(watts,w) Energy: το έργο που χρειάζεται για να γίνει μια εργασία (Joules, J) Problem: Get power in, get power out αλλιώς υπερθέρμανση Doubling of transistors per new technology node not matched with a more than half reduction in energy per transistor Thermal Design Power (TDP) Characterizes sustained power consumption Used as target for power supply and cooling system Cost implications Lower than peak power, higher than average power consumption 130W for server/desktops vs few watts for mobile/tablets Data Center Racks can air cool more than ~20KW/rack Trends in Power and Energy

Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air Power Wall Trends in Power and Energy Power limits single thread performance

Dynamic Energy and Power Dynamic energy Transistor switch from 0 -> 1 or 1 -> 0 Ε = ½ x Capacitive load x Voltage 2 Depends on number of transistors switching Trends in Power and Energy Dynamic power = Energy/Time Power in a cycle: P = ½ x Capacitive load x Voltage 2 x Frequency switched Reducing clock rate reduces power, not energy Often report energy per task

Reducing Power Techniques for reducing power: Do nothing well (για Data Centers αλλά και για κινητές συσκευές) Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks Power gating: turning off cores Parallelism: get same performance with less power. Trends in Power and Energy

Reducing Power with Parallelism Assumptions: Provided a program is parallelizable F ~ V (simplified example) We know P = C V 2 F Case A: one processor running at F burns C V 2 F Case B: two processors running at F/2 will finish the same time as A but burn 2 (C (V/2) 2 F/2) = C V 2 F /4 (25%!) 2X real estate. Trends in Power and Energy

Static Power Καταναλώνεται συνέχεια στους επεξεργαστές, DRAM και όταν δεν κάνουμε χρήσιμο έργο λόγο διαρροής (leakage) Trends in Power and Energy Static power consumption Current static x Voltage Scales with number of transistors To reduce: power gating, lower voltage

Current Trends in Architecture Introduction For several years more ILP (How? 370) Cannot continue to leverage Instruction-Level parallelism (ILP) Single processor performance improvement ended in 2003 technology and program properties Other types of parallelism pursued: Data-level parallelism (DLP) Task-level parallelism (TLP) Require parallel architectures and explicit restructuring of the applications (e.g. parallel programming)

Classes of Computers Personal Mobile Device (PMD) e.g. smart phones, tablet computers Emphasis on energy efficiency and real-time Desktop Computing Emphasis on price-performance Servers Emphasis on availability, scalability, throughput Clusters/Warehouse Scale Computers Used for Software as a Service (SaaS) also as IaaS, PaaS Emphasis on availability and price-performance Supercomputers, emphasis: floating-point performance and fast internal networks Embedded Emphasis: price Classes of Computers

Current Parallelism in Computer Classes (processors/cores) Personal Mobile Device (PMD) 1 <10 e.g. start phones, tablet computers Emphasis on energy efficiency and real-time Desktop Computing 1-2 10-100 Emphasis on price-performance Servers 1-10s 10-1k Emphasis on availability, scalability, throughput Clusters/Warehouse Scale Computers 1K to 100K (x10) Used for Software as a Service (SaaS) also as IaaS PaaS Emphasis on availability and price-performance Supercomputers, emphasis: floating-point performance and fast internal networks 1K to 100K (x10) Embedded Computers can vary Emphasis: price Classes of Computers

Parallelism Classes of parallelism in applications: Data-Level Parallelism (DLP) Task-Level Parallelism (TLP) Classes of Computers Classes of architectural parallelism: Instruction-Level Parallelism (ILP) Vector architectures/graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism (Throughput)

Flynn s Taxonomy (1960s) Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD) Vector architectures Multimedia extensions Graphics processor units Classes of Computers Multiple instruction streams, single data stream (MISD) No commercial implementation Multiple instruction streams, multiple data streams (MIMD) Tightly-coupled MIMD (multi and many-cores) Loosely-coupled MIMD (clusters, ware-houses)

Classes of Computers PP is not New, March 1961:

Parallel Processing is not New Multiprocessors and multicomputers have been around for decades Using specialized or commodity parts Parallel programming for certain domains very well developed art Many parallel programming languages and frameworks Numerous efforts for automatic (compiler) parallelization and vectorization Classes of Computers

What is new The lack of other alternatives to go forward for more general purpose market segments Power wall (non-ideal scaling of technology parameters, wires vs xtrs, voltage vs xtr area, leakage vs xtr area) Embrace of multi/many cores across industry Motivates both hardware and software industry to develop effective means to exploit parallel resources Emergence of large scale RLP (more throughput) No silver bullet for all applications Specialization and heterogeneity Small cores, big cores, gpus, accelerators etc Classes of Computers

Το μάθημα Εισαγωγή στις βασικές έννοιες της Παράλληλης Επεξεργασίας: υπολογιστές, αλγοριθμική σκέψη και προγραμματισμός Μέθοδος διδασκαλίας ιαλέξεις -Π 9-10.30 ΧΩ 01 004 Φροντιστήριο Τ 11-12 ΘΕΕ01 103 Προαπαιτούμενα: 121,131,221,222 Βιβλιογραφία: Επιλεγμένα άρθρα/βίντεο Principles of Parallel Programming, Calvin Lin & Lawrence Snyder Computer Architecture a Quantitative Approach, 5 th ed. Αξιολόγηση Εργασίες/Project/Παρουσιαση 35% εργασίες (~20) Project (~15) Ενδιάμεση Εξέταση 20% Τελική Εξέταση 45%

Γιατί να πάρετε το μάθημα Παράλληλοι υπολογιστές είναι παντού multi-cores/gpus σε phones, notebooks, desktops, games Πχ 8 πυρήνες στον AMD bulldozer, 4 πυρήνες στο ipad και σε κινητά Servers, clusters, data-centers, cloud computing, supercomputers και θα γίνουν ακόμη πιο διαδεδομένοι Πως γράφουμε παράλληλα προγράμματα γράφουμε παράλληλα ένα πρόγραμμα για να έχουμε πιο γρήγορη εκτέλεση for(i=0;i<n;++i) => forall(i in 0 to n-1 step 1) Τι περιορίζει/επηρεάζει την επίδοση; Πως βελτιώνουμε την επίδοση; Scalable και portable

Ύλη Βασικά-Εισαγωγή Παράλληλοι Υπολογιστές Multicores, GPUs, Data Centers, Supercomputers Μέτρηση/Σύγκριση Επίδοσης Παράλληλη Αφαιρετικότητα Βασικές παράλληλες δομές ελέγχου και δεδομένων Επεκτάσιμες Αλγοριθμικές Τεχνικές Παράλληλες Γλώσσες Προγραμματισμού Threads: p-threads, open-mp Message Passing: MPI CUDA Παρούσα Κατάσταση και σύγχρονες τάσεις Transactions MapReduce