ιασωλήνωση - Pipelining Pedo Tancoso H&P Appendix A Βιοµηχανία Αυτοκινήτων 2 1
Βιοµηχανία Αυτοκινήτων 3 Βιοµηχανία Αυτοκινήτων 4 2
Βιοµηχανία Αυτοκινήτων 5 Βιοµηχανία Αυτοκινήτων 1 6 3
Βιοµηχανία Αυτοκινήτων 2 1 7 Βιοµηχανία Αυτοκινήτων 3 2 1 8 4
Βιοµηχανία Αυτοκινήτων 4 3 2 1 9 Βιοµηχανία Αυτοκινήτων 5 4 3 2 10 5
Ας πλύνουµε ταρούχα! Σειριακή Μέθοδος 6 PM 7 8 9 10 11 Midnight Time T a s k O d e A B C D 30 40 20 30 40 20 30 40 20 30 40 20 Sequential laundy takes 6 hous fo 4 loads If they leaned pipelining, how long would laundy take? 11 Χρησιµοποιώντας ιασωλήνωση... 6 PM 7 8 9 10 11 Midnight Time T a s k O d e A B C D 30 40 40 40 40 20 Pipelined laundy takes 3.5 hous fo 4 loads 12 6
Μάθαµε ότι... T a s k O d e A B C D 6 PM 7 8 9 Time 30 40 40 40 40 20 Pipelining doesn t help latency (χρόνος αναµονής) of single task, it helps thoughput (ρυθµοαπόδοση) of entie wokload Pipeline ate limited by slowest pipeline stage Multiple tasks opeating simultaneously Potential speedup = Numbe pipe stages Unbalanced lengths of pipe stages educes speedup Time to fill pipeline and time to dain it educes speedup 13 ιασωλήνωση Εντολών Execute billions of instuctions, so thoughput is what mattes What is desiable in the ISA (ΑΣΕ) fo pipelining? Vaiable length instuctions vs. all instuctions same length? Memoy opeands pat of any opeation vs. memoy opeands only in loads o stoes? iste opeand (τελεστέος) many places in instuction fomat vs. egistes located in same place? 14 7
Κύκλος Εκτέλεσης Instuction Fetch Instuction Decode Opeand Fetch Execute Obtain instuction fom pogam stoage Detemine equied actions and instuction size Locate and obtain opeand data Compute esult value o status Result Stoe Deposit esults in stoage fo late use Next Instuction Detemine successo instuction 15 Στάδια ιασωλήνωσης του DLX (1) 1. Instuction Fetch () 2. Instuction Decode / iste Fetch () 3. Execution / Effective Addess (EX) 4. Memoy Access / Banch Completion (MEM) 5. Wite-back (WB) (go to 1!) 16 8
Στάδια ιασωλήνωσης του DLX (2) 1. Instuction Fetch () IR Mem[PC] NPC PC+4 2. Instuction Decode / iste Fetch () A s[ir6..10] B s[ir11..15] Imm ((IR16)^16##IR16..31) 3. Execution / Effective Addess (EX) Mem ef: output A+Imm -eg (op): output A op B -imm ( op): output A op Imm Banch: output NPC+Imm cond (A op 0) 17 Στάδια ιασωλήνωσης του DLX (3) 4. Memoy Access / Banch Completion (MEM) Mem access: LMD Mem[ output] Mem[ output] B Banch: if (cond) PC output else PC NPC 5. Wite-back (WB) -eg inst: s[ir16..20] output -imm inst: s[ir11..15] output Load inst: s[ir11..15] LMD (go to 1!) 18 9
ιασωλήνωση του DLX Cycle numbe 1 Instuction I Instuction I+1 Instuction I+2 Instuction I+3 Instuction I+4 19 DLX Pipeline Cycle numbe 1 2 Instuction I Instuction I+1 Instuction I+2 Instuction I+3 Instuction I+4 20 10
DLX Pipeline Cycle numbe 1 2 3 Instuction I EX Instuction I+1 Instuction I+2 Instuction I+3 Instuction I+4 21 DLX Pipeline Cycle numbe 1 2 3 4 Instuction I EX MEM Instuction I+1 EX Instuction I+2 Instuction I+3 Instuction I+4 22 11
DLX Pipeline Cycle numbe 1 2 3 4 5 Instuction I EX MEM WB Instuction I+1 EX MEM Instuction I+2 EX Instuction I+3 Instuction I+4 23 DLX Pipeline Cycle numbe 1 2 3 4 5 6 Instuction I EX MEM WB Instuction I+1 EX MEM WB Instuction I+2 EX MEM Instuction I+3 EX Instuction I+4 24 12
DLX Pipeline Cycle numbe 1 2 3 4 5 6 7 8 9 Instuction I EX MEM WB Instuction I+1 EX MEM WB Instuction I+2 EX MEM WB Instuction I+3 EX MEM WB Instuction I+4 EX MEM WB 25 Παράδειγµα: MIPS iste-iste 31 26 25 2120 16 15 1110 6 5 0 Op Rs1 iste-immediate Rs2 31 26 25 2120 16 15 0 Op Rs1 Rd immediate Banch 31 26 25 0 Op taget Rd Opx 31 26 25 2120 16 15 0 Op Rs1 Rs2/Opx immediate Jump / Call 26 13
5 Βήµατα της ιόδου εδοµένων (Datapath) του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Zeo? MUX Addess Memoy Inst RS2 RD File MUX MUX Data Memoy L M D MUX Imm Sign Extend WB Data 27 5 Βήµατα της ιόδου εδοµένων (Datapath) του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Memoy / RS2 File /EX MUX MUX EX/MEM Memoy MEM/WB MUX Datapath Imm Sign Extend RD RD RD WB Data Contol Path 28 14
5 Βήµατα της ιόδου εδοµένων (Datapath) του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Memoy / RS2 File /EX MUX MUX EX/MEM Memoy MEM/WB MUX Datapath Inst 1 2 Inst 3 Imm Sign Extend RD RD RD WB Data Contol Path Inst 1 Inst 2 Inst 1 29 Εκτέλεση µε ιασωλήνωση Time (clock cycles) I n s t. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 O d e 30 15
Όρια της ιασωλήνωσης Hazads (Κίνδυνοι): cicumstances that would cause incoect execution if next instuction wee launched Stuctual hazads (κίνδυνος δοµής): Attempting to use the same hadwae to do two diffeent things at the same time Data hazads (κίνδυνος δεδοµένων): Instuction depends on esult of pio instuction still in the pipeline Contol hazads (κίνδυνος ελέγχου): Caused by delay between the fetching of instuctions and decisions about changes in contol flow (banches and jumps). 31 Παράδειγµα: µια πόρτα µνήµης Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Stuctual Hazad 32 16
Λύσεις για κίνδυνους δοµής Defn: attempt to use same hadwae fo two diffeent things at the same time Solution 1: Wait must detect the hazad must have mechanism to stall Solution 2: Thow moe hadwae at the poblem 33 Εύρεση και λύση ενός κινδύνου δοµής Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble 34 17
ΛύσητωνΚίνδυνων οµών στο σχεδιασµό Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Inst Cache / RS2 File /EX MUX MUX EX/MEM Data Cache MEM/WB MUX Datapath Imm Sign Extend RD RD RD WB Data Contol Path 35 Σηµασία του ΑΣΕ στην λύση των Κινδύνων οµής Simple to detemine the sequence of esouces used by an instuction opcode tells it all Unifomity in the esouce usage Compae MIPS to IA32? MIPS appoach => all instuctions flow though same 5-stage pipeling 36 18
Κίνδυνοι εδοµένων Time (clock cycles) /RF EX MEM WB I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 37 Τρεις Είδους Κίνδυνοι εδοµένων Read Afte Wite (RAW) ( ιάβασµα µετά από γράψιµο) Inst J ties to ead opeand befoe Inst I wites it I: add 1,2,3 J: sub 4,1,3 Caused by a Data Dependence (εξάρτηση δεδοµένων). This hazad esults fom an actual need fo communication. 38 19
Τρεις Είδους Κίνδυνοι εδοµένων Wite Afte Read (WAR) (Γράψιµο µετά από διάβασµα) Inst J wites opeand befoe Inst I eads it I: sub 4,1,3 J: add 1,2,3 K: mul 6,1,7 Called an anti-dependence by compile wites. This esults fom euse of the name 1. Can it happen in the MIPS 5 stage pipeline? Can t happen in MIPS 5 stage pipeline because: All instuctions take 5 stages, and Reads ae always in stage 2, and Wites ae always in stage 5 39 Τρεις Είδους Κίνδυνοι εδοµένων Wite Afte Wite (WAW) (Γράψιµο µετά από γράψιµο) Inst J wites opeand befoe Inst I wites it. I: sub 1,4,3 J: add 1,2,3 K: mul 6,1,7 Called an output dependence (εξάρτηση εξόδου) by compile wites. This also esults fom the euse of name 1. Can it happen in the MIPS 5 stage pipeline? Can t happen in MIPS 5 stage pipeline because: All instuctions take 5 stages, and Wites ae always in stage 5 Will see WAR and WAW in late moe complicated pipes 40 20
Μεταβίβαση (Fowading) για αποφυγή Κινδύνου εδοµένων Time (clock cycles) I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 41 Αλλαγές στο υλικό για Fowading NextPC istes /EX mux mux EX/MEM Data Memoy MEM/WR Immediate mux 42 21
Κίνδυνοι εδοµένων και µε Fowading Time (clock cycles) I n s t. lw 1, 0(2) sub 4,1,6 O d e and 6,1,7 o 8,1,9 43 Λήξη του κινδύνου της εντολής φόρτωσης Adding hadwae?... not Detection? Compilation techniques? What is the cost of load delays? 44 22
Λήξη του κινδύνου της εντολής φόρτωσης Time (clock cycles) I n s t. O d e lw 1, 0(2) sub 4,1,6 and 6,1,7 Bubble Bubble o 8,1,9 Bubble How is this diffeent fom the instuction issue stall? 45 Χρονοδροµολόγηση Λογισµικού για αποφυγή κίνδυνου δεδοµένων Softwae Scheduling to Avoid Load Hazads Ty poducing fast code fo a = b + c; d = e f; assuming a, b, c, d,e, and f in memoy. Slow code: LW STALL LW ADD SW LW STALL LW SUB SW Rb,b Rc,c Ra,Rb,Rc a,ra Re,e Rf,f Rd,Re,Rf d,rd Fast code: LW LW LW ADD LW SW SUB SW Rb,b Rc,c Re,e Ra,Rb,Rc Rf,f a,ra Rd,Re,Rf d,rd 46 23
Σχέση µε τη ΕΣΑ What is exposed about this oganizational hazad in the instuction set? k cycle delay? bad, CPI is not pat of ISA k instuction slot delay load should not be followed by use of the value in the next k instuctions Nothing, but code can educe un-time delays MIPS did the tansfomation in the assemble 47 Κίνδυνος Ελέγχου από εντολές διακλάδωσης (Banches) => Thee Stage Stall 10: beq 1,3,36 14: and 2,3,5 18: o 6,1,7 22: add 8,1,9 36: xo 10,1,11 48 24
Παράδειγµα: Banch Stall Impact If 30% banch, Stall 3 cycles significant (CPI=?) Two pat solution: Detemine banch taken o not soone, AND Compute taken banch addess ealie MIPS banch tests if egiste = 0 o 0 MIPS Solution: Move Zeo test to /RF stage Adde to calculate new PC in /RF stage 1 clock cycle penalty fo banch vesus 3 49 ιασωλήνωση της διόδου δεδοµένων του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Adde MUX Zeo? Addess Memoy / RS2 File /EX MUX EX/MEM Data Memoy MEM/WB MUX Imm Sign Extend RD RD RD WB Data Data stationay contol local decode fo each instuction phase / pipeline stage 50 25
ΤέσσεριςΕπιλογέςγιαΚίνδυνους Ελέγχου #1: Stall until banch diection is clea #2: Pedict Banch Not Taken Execute successo instuctions in sequence Squash instuctions in pipeline if banch actually taken Advantage of late pipeline state update 47% MIPS banches not taken on aveage (CPI=?) PC+4 aleady calculated, so use it to get next instuction #3: Pedict Banch Taken 53% MIPS banches taken on aveage But haven t calculated banch taget addess in MIPS MIPS still incus 1 cycle banch penalty Othe machines: banch taget known befoe outcome 51 ΤέσσεριςΕπιλογέςγιαΚίνδυνους Ελέγχου #4: Delayed Banch Define banch to take place AFTER a following instuction banch instuction sequential successo 1 sequential successo 2... sequential successo n... banch taget if taken Banch delay of length n 1 slot delay allows pope decision and banch taget addess in 5 stage pipeline MIPS uses this 52 26
Καθυστερηµένη Εντολή ιακλάδωσης (Delayed Banch) Whee to get instuctions to fill banch delay slot? Befoe banch instuction Fom the taget addess: only valuable when banch taken Fom fall though: only valuable when banch not taken Canceling banches allow moe slots to be filled Compile effectiveness fo single banch delay slot: Fills about 60% of banch delay slots About 80% of instuctions executed in banch delay slots useful in computation About 50% (60% x 80%) of slots usefully filled Delayed Banch downside: 7-8 stage pipelines, multiple instuctions issued pe clock (supescala) 53 Καθυστερηµένη Εντολή ιακλάδωσης (Delayed Banch) 54 27
Recall: Speed Up συνάρτηση για διασωλήνωση CPI pipelined = Ideal CPI + Aveage Stall cycles pe Inst Ideal CPI Pipeline depth Speedup = Ideal CPI + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined Fo simple RISC pipeline, CPI = 1: Pipeline depth Speedup = 1 + Pipeline stall CPI Cycle Cycle Time Time unpipelined pipelined 55 Παράδειγµα: Αξιολόγηση της ιαφορετικής Υλοποίησης της Εντολής ιακλάδωσης Pipeline speedup = Pipeline depth 1 +Banch fequency Banch penalty Assume: Conditional & Unconditional = 14%, 65% change PC Scheduling Banch CPI speedup v. scheme penalty stall Stall pipeline 3 1.42 1.0 Pedict taken 1 1.14 1.26 Pedict not taken 1 1.09 1.29 Delayed banch 0.5 1.07 1.31 56 28
Άλλες υσκολίες... Χρόνος αναµονής της κρυφής µνήµης ιακοπές και Εξαιρέσεις (Inteupts and Exceptions) ΑΣΕ (π.χ. ADD3 42(R1),56(R1)+,@(R1)) Λειτουργίες Πολύ-κύκλων (Multicycle Opeations) 57 Σχέση µεταξύ Κρυφής Μνήµης και ιασωλήνωσης Memoy I-$ D-$ Next PC Addess 4 Adde Memoy / Next SEQ PC Adde RS1 RS2 Imm MUX Zeo? File Sign Extend /EX MUX EX/MEM Data Memoy MEM/WB MUX WB Data RD RD RD 58 29
Λειτουργίες Πολύ-κύκλων 59 Λειτουργίες Πολύ-κύκλων 60 30
Παράδειγµα: MIPS R4000 Στάδια (8): IS RF EX DF DS TC -WB 61 Άλλα Παραδείγµατα PowePC G4 (4): FETCH DECODE / DISPATCH EXECUTE COMPLETE / WRITE-BACK PowePC G4e (7): FETCH FETCH DECODE / DISP ISSUE EXE COMPL WRITE - BACK Pentium 4 (20): T C N I P T C N I P T C F T C F D R I V E A & R A & R A & R Q S C H E S C H E S C H E D I S P D I S P R E G R E G E X E F L A G S B R A N D R I V E 62 31