Διασωλήνωση - Pipelining Pedo Tancoso Appendix A (και διαφάνειες από Pof. David Culle, Bekeley) Βιοµηχανία Αυτοκινήτων 1
Βιοµηχανία Αυτοκινήτων Βιοµηχανία Αυτοκινήτων 2
Βιοµηχανία Αυτοκινήτων Βιοµηχανία Αυτοκινήτων 1 3
Βιοµηχανία Αυτοκινήτων 2 1 Βιοµηχανία Αυτοκινήτων 3 2 1 4
Βιοµηχανία Αυτοκινήτων 4 3 2 1 Βιοµηχανία Αυτοκινήτων 5 4 3 2 5
Ας πλύνουµε τα ρούχα! Σειριακή Μεθόδος 6 PM 7 8 9 10 11 Midnight Time T a s k O d e n n 30 40 20 30 40 20 30 40 20 30 40 20 A B C D Sequential laundy takes 6 hous fo 4 loads If they leaned pipelining, how long would laundy take? Χρησιµοποιώντας Διασωλήνωση... 6 PM 7 8 9 10 11 Midnight Time T a s k O d e A B C D 30 40 40 40 40 20 n Pipelined laundy takes 3.5 hous fo 4 loads 6
T a s k O d e Μάθαµε ότι... A B C D 6 PM 7 8 9 Time 30 40 40 40 40 20 n Pipelining doesn t help latency (χρόνος αναµονής) of single task, it helps thoughput (ρυθµοαπόδοση) of entie wokload n Pipeline ate limited by slowest pipeline stage n Multiple tasks opeating simultaneously n Potential speedup = Numbe pipe stages n Unbalanced lengths of pipe stages educes speedup n Time to fill pipeline and time to dain it educes speedup Διασωλήνωση Εντολών n Execute billions of instuctions, so thoughput is what mattes n What is desiable in the ISA (ΑΣΕ) fo pipelining? n Vaiable length instuctions vs. all instuctions same length? n Memoy opeands pat of any opeation vs. memoy opeands only in loads o stoes? n iste opeand (τελεστέος) many places in instuction fomat vs. egistes located in same place? 7
Κύκλος Εκτέλεσης Obtain instuction fom pogam stoage Instuction Fetch Instuction Decode Opeand Fetch Execute Detemine equied actions and instuction size Locate and obtain opeand data Compute esult value o status Result Stoe Next Instuction Deposit esults in stoage fo late use Detemine successo instuction Στάδια Διασωλήνωσης του DLX (1) 1. Instuction Fetch (IF) 2. Instuction Decode / iste Fetch (ID) 3. Execution / Effective Addess (EX) 4. Memoy Access / Banch Completion (MEM) 5. Wite-back (WB) (go to 1!) 8
Στάδια Διασωλήνωσης του DLX (2) 1. Instuction Fetch (IF) IR! Mem[PC] NPC! PC+4 2. Instuction Decode / iste Fetch (ID) A! s[ir6..10] B! s[ir11..15] Imm! ((IR16)^16##IR16..31) 3. Execution / Effective Addess (EX) Mem ef: output! A+Imm -eg (op): output! A op B -imm ( op): output! A op Imm Banch: output! NPC+Imm cond! (A op 0) Στάδια Διασωλήνωσης του DLX (3) 4. Memoy Access / Banch Completion (MEM) Mem access: LMD! Mem[ output] Mem[ output]! B Banch: if (cond) PC! output else PC! NPC 5. Wite-back (WB) -eg inst: s[ir16..20]! output -imm inst: s[ir11..15]! output Load inst: s[ir11..15]! LMD (go to 1!) 9
Διασωλήνωση του DLX Cycle numbe 1 Instuction I IF Instuction I+1 Instuction I+2 Instuction I+3 Instuction I+4 DLX Pipeline Cycle numbe 1 2 Instuction I IF ID Instuction I+1 IF Instuction I+2 Instuction I+3 Instuction I+4 10
DLX Pipeline Cycle numbe 1 2 3 Instuction I IF ID EX Instuction I+1 IF ID Instuction I+2 IF Instuction I+3 Instuction I+4 DLX Pipeline Cycle numbe 1 2 3 4 Instuction I IF ID EX MEM Instuction I+1 IF ID EX Instuction I+2 IF ID Instuction I+3 IF Instuction I+4 11
DLX Pipeline Cycle numbe 1 2 3 4 5 Instuction I IF ID EX MEM WB Instuction I+1 IF ID EX MEM Instuction I+2 IF ID EX Instuction I+3 IF ID Instuction I+4 IF DLX Pipeline Cycle numbe 1 2 3 4 5 6 Instuction I IF ID EX MEM WB Instuction I+1 IF ID EX MEM WB Instuction I+2 IF ID EX MEM Instuction I+3 IF ID EX Instuction I+4 IF ID 12
DLX Pipeline Cycle numbe 1 2 3 4 5 6 7 8 9 Instuction I IF ID EX MEM WB Instuction I+1 IF ID EX MEM WB Instuction I+2 IF ID EX MEM WB Instuction I+3 IF ID EX MEM WB Instuction I+4 IF ID EX MEM WB Παράδειγµα: MIPS iste-iste 31 26 25 2120 16 15 1110 6 5 0 Op Rs1 iste-immediate Rs2 31 26 25 2120 16 15 0 Op Rs1 Rd immediate Banch 31 26 25 0 Op taget Rd Opx 31 26 25 2120 16 15 0 Op Rs1 Rs2/Opx immediate Jump / Call 13
5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS Next PC Instuction Fetch 4 Adde Inst. Decode. Fetch Next SEQ PC RS1 Execute Add. Calc Zeo? Memoy Access MUX Wite Back Addess Memoy Inst RS2 RD File MUX MUX Data Memoy L M D MUX Imm Sign Extend WB Data 5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Memoy IF/ID RS2 File ID/EX MUX MUX EX/MEM Memoy MEM/WB MUX Datapath Imm Sign Extend RD RD RD WB Data Contol Path 14
5 Βήµατα της Διόδου Δεδοµένων (Datapath) του MIPS Instuction Fetch Inst. Decode. Fetch Execute Add. Calc Memoy Access Wite Back Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Memoy IF/ID RS2 File ID/EX MUX MUX EX/MEM Memoy MEM/WB MUX Datapath Inst 1 2 Inst 3 Imm Sign Extend RD RD RD WB Data Contol Path Inst 1 Inst 2 Inst 1 Εκτέλεση µε Διασωλήνωση Time (clock cycles) I n s t. Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 O d e 15
Όρια της Διασωλήνωσης n Hazads (Κίνδυνοι): cicumstances that would cause incoect execution if next instuction wee launched n Stuctual hazads (κίνδυνος δοµής): Attempting to use the same hadwae to do two diffeent things at the same time n Data hazads (κίνδυνος δεδοµένων): Instuction depends on esult of pio instuction still in the pipeline n Contol hazads (κίνδυνος ελέγχου): Caused by delay between the fetching of instuctions and decisions about changes in contol flow (banches and jumps). Παράδειγµα: µια πόρτα µνήµης Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Inst 3 Inst 4 Stuctual Hazad 16
Λύσεις για κίνδυνους δοµής n Defn: attempt to use same hadwae fo two diffeent things at the same time n Solution 1: Wait must detect the hazad must have mechanism to stall n Solution 2: Thow moe hadwae at the poblem Εύρεση και λύση ενός κινδύνου δοµής Time (clock cycles) Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 I n s t. O d e Load Inst 1 Inst 2 Stall Inst 3 Bubble Bubble Bubble Bubble Bubble 17
Λύση των Κίνδυνων Δοµών στο σχεδιασµό Next PC 4 Adde Next SEQ PC RS1 Next SEQ PC Zeo? MUX Addess Inst Cache IF/ID RS2 File ID/EX MUX MUX EX/MEM Data Cache MEM/WB MUX Datapath Imm Sign Extend RD RD RD WB Data Contol Path Σηµασία του ΑΣΕ στην λύση των Κινδύνων Δοµής n Simple to detemine the sequence of esouces used by an instuction n opcode tells it all n Unifomity in the esouce usage n Compae MIPS to IA32? n MIPS appoach => all instuctions flow though same 5-stage pipeling 18
Κίνδυνοι Δεδοµένων Time (clock cycles) IF ID/RF EX MEM WB I n s t. add 1,2,3 sub 4,1,3 O d e and 6,1,7 o 8,1,9 xo 10,1,11 Τρεις Είδους Κίνδυνοι Δεδοµένων n Read Afte Wite (RAW) (Διάβασµα µετά από γράψιµο) Inst J ties to ead opeand befoe Inst I wites it I: add 1,2,3 J: sub 4,1,3 Caused by a Data Dependence (εξάρτηση δεδοµένων). This hazad esults fom an actual need fo communication. 19
n Τρεις Είδους Κίνδυνοι Δεδοµένων Wite Afte Read (WAR) (Γράψιµο µετά από διάβασµα) Inst J wites opeand befoe Inst I eads it n n n I: sub 4,1,3 J: add 1,2,3 K: mul 6,1,7 Called an anti-dependence by compile wites. This esults fom euse of the name 1. Can it happen in the MIPS 5 stage pipeline? Can t happen in MIPS 5 stage pipeline because: n All instuctions take 5 stages, and n Reads ae always in stage 2, and n Wites ae always in stage 5 Τρεις Είδους Κίνδυνοι Δεδοµένων n n n n n Wite Afte Wite (WAW) (Γράψιµο µετά από γράψιµο) Inst J wites opeand befoe Inst I wites it. I: sub 1,4,3 J: add 1,2,3 K: mul 6,1,7 Called an output dependence (εξάρτηση εξόδου) by compile wites. This also esults fom the euse of name 1. Can it happen in the MIPS 5 stage pipeline? Can t happen in MIPS 5 stage pipeline because: n All instuctions take 5 stages, and n Wites ae always in stage 5 Will see WAR and WAW in late moe complicated pipes 20
I n s t. O d e Μεταβίβαση (Fowading) για αποφυγή Κινδύνου Δεδοµένων add 1,2,3 sub 4,1,3 and 6,1,7 o 8,1,9 Time (clock cycles) xo 10,1,11 Αλλαγές στο υλικό για Fowading NextPC istes ID/EX mux mux EX/MEM Data Memoy MEM/WR Immediate mux 21
Κίνδυνοι Δεδοµένων και µε Fowading Time (clock cycles) I n s t. lw 1, 0(2) sub 4,1,6 O d e and 6,1,7 o 8,1,9 Λήξη του κινδύνου της εντολής φόρτωσης n Adding hadwae?... not n Detection? n Compilation techniques? n What is the cost of load delays? 22
I n s t. O d e Λήξη του κινδύνου της εντολής φόρτωσης Time (clock cycles) lw 1, 0(2) sub 4,1,6 and 6,1,7 o 8,1,9 Bubble Bubble Bubble How is this diffeent fom the instuction issue stall? Χρονοδροµολόγηση Λογισµικού για αποφυγή κίνδυνου δεδοµένων Softwae Scheduling to Avoid Load Hazads Ty poducing fast code fo a = b + c; d = e f; assuming a, b, c, d,e, and f in memoy. Slow code: LW Rb,b STALL LW Rc,c ADD Ra,Rb,Rc SW a,ra STALL LW Re,e LW Rf,f SUB Rd,Re,Rf SW d,rd Fast code: LW Rb,b LW Rc,c LW Re,e ADD Ra,Rb,Rc LW Rf,f SW a,ra SUB Rd,Re,Rf SW d,rd 23
Σχέση µε τη ΕΣΑ n What is exposed about this oganizational hazad in the instuction set? n k cycle delay? n bad, CPI is not pat of ISA n k instuction slot delay n load should not be followed by use of the value in the next k instuctions n Nothing, but code can educe un-time delays n MIPS did the tansfomation in the assemble Κίνδυνος Ελέγχου από εντολές διακλάδωσης (Banches) => Thee Stage Stall 10: beq 1,3,36 14: and 2,3,5 18: o 6,1,7 22: add 8,1,9 36: xo 10,1,11 24
Παράδειγµα: Banch Stall Impact n If 30% banch, Stall 3 cycles significant (CPI=?) n Two pat solution: n Detemine banch taken o not soone, AND n Compute taken banch addess ealie n MIPS banch tests if egiste = 0 o 0 n MIPS Solution: n Move Zeo test to ID/RF stage n Adde to calculate new PC in ID/RF stage n 1 clock cycle penalty fo banch vesus 3 Διασωλήνωση της διόδου δεδοµένων του MIPS Next PC Instuction Fetch 4 Adde Inst. Decode. Fetch Next SEQ PC Adde RS1 MUX Zeo? Execute Add. Calc Memoy Access Wite Back Addess Memoy IF/ID RS2 File ID/EX MUX EX/MEM Data Memoy MEM/WB MUX Imm Sign Extend RD RD RD WB Data Data stationay contol local decode fo each instuction phase / pipeline stage 25
Τέσσερις Επιλογές για Κίνδυνους Ελέγχου #1: Stall until banch diection is clea #2: Pedict Banch Not Taken n Execute successo instuctions in sequence n Squash instuctions in pipeline if banch actually taken n Advantage of late pipeline state update n 47% MIPS banches not taken on aveage (CPI=?) n PC+4 aleady calculated, so use it to get next instuction #3: Pedict Banch Taken n 53% MIPS banches taken on aveage n But haven t calculated banch taget addess in MIPS n MIPS still incus 1 cycle banch penalty n Othe machines: banch taget known befoe outcome Τέσσερις Επιλογές για Κίνδυνους Ελέγχου #4: Delayed Banch n Define banch to take place AFTER a following instuction banch instuction sequential successo 1 sequential successo 2... sequential successo n... banch taget if taken Banch delay of length n n n 1 slot delay allows pope decision and banch taget addess in 5 stage pipeline MIPS uses this 26
Καθυστερηµένη Εντολή Διακλάδωσης (Delayed Banch) n n n Whee to get instuctions to fill banch delay slot? n Befoe banch instuction n Fom the taget addess: only valuable when banch taken n Fom fall though: only valuable when banch not taken n Canceling banches allow moe slots to be filled Compile effectiveness fo single banch delay slot: n Fills about 60% of banch delay slots n About 80% of instuctions executed in banch delay slots useful in computation n About 50% (60% x 80%) of slots usefully filled Delayed Banch downside: 7-8 stage pipelines, multiple instuctions issued pe clock (supescala) Καθυστερηµένη Εντολή Διακλάδωσης (Delayed Banch) 27
Recall: Speed Up συνάρτηση για διασωλήνωση CPI pipelined = Ideal CPI + Aveage Stall cycles pe Inst Ideal CPI Pipeline depth Speedup = Ideal CPI + Pipeline stall CPI Cycle Time Cycle Time unpipelined pipelined Fo simple RISC pipeline, CPI = 1: Pipeline depth Speedup = 1 + Pipeline stall CPI Cycle Time Cycle Time unpipelined pipelined Παράδειγµα: Αξιολόγηση της Διαφορετικής Υλοποίησης της Εντολής Διακλάδωσης Pipeline speedup = Pipeline depth 1 +Banch fequency Banch penalty Assume: Conditional & Unconditional = 14%, 65% change PC Scheduling Banch CPI speedup v. scheme penalty stall Stall pipeline 3 1.42 1.0 Pedict taken 1 1.14 1.26 Pedict not taken 1 1.09 1.29 Delayed banch 0.5 1.07 1.31 28
Άλλες Δυσκολίες... n Χρόνος αναµονής της κρυφής µνήµης n Διακοπές και Εξαιρέσεις (Inteupts and Exceptions) n ΑΣΕ (π.χ. ADD3 42(R1),56(R1)+,@(R1)) n Λειτουργίες Πολύ-κύκλων (Multicycle Opeations) Σχέση µεταξύ Κρυφής Μνήµης και Διασωλήνωσης Memoy I-$ D-$ Next PC Addess 4 Adde Memoy IF/ID Next SEQ PC Adde RS1 RS2 Imm MUX Zeo? File Sign Extend ID/EX MUX EX/MEM Data Memoy MEM/WB MUX WB Data RD RD RD 29
Λειτουργίες Πολύ-κύκλων Λειτουργίες Πολύ-κύκλων 30
31 Παράδειγµα: MIPS R4000 Στάδια (8): IF IS RF EX DF DS TC - WB Άλλα Παραδείγµατα n PowePC G4 (4): n PowePC G4e (7): n Pentium 4 (20): FETCH DECODE / DISPATCH EXECUTE COMPLETE / WRITE-BACK FETCH FETCH DECODE / DISP ISSUE EXE COMPL WRITE - BACK T C N I P T C N I P T C F T C F D R I V E A & R A & R A & R Q S C H E S C H E S C H E D I S P D I S P R E G R E G E X E F L A G S B R A N D R I V E