ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ, ΚΑΙ ΜΗΧΑΝΙΚΩΝ Η/Υ

ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ, ΚΑΙ ΜΗΧΑΝΙΚΩΝ Η/Υ ΑΝΑΖΗΤΗΣΗ ΣΤΟ ΧΩΡΟ ΣΥΝΘΕΣΗΣ ΚΑΙ ΒΕΛΤΙΣΤΟΠΟΙΗΣΗΣ ΜΝΗΜΩΝ CACHE ΚΑΤΩ ΑΠΟ ΤΗ ΔΙΑΚΥΜΑΝΣΗ ΠΑΡΑΜΕΤΡΩΝ ON THE EXLORATION AND OPTIMIZATION OF CACHES UNDER PARAMETRIC VARIATION Μεταπτυχιακή Διατριβή Χαράλαμπος Γ. Αντωνιάδης Επιβλέποντες Καθηγητές : Ευμορφόπουλος Νέστωρας Επίκουρος Καθηγητής Σταμούλης Γεώργιος Καθηγητής Τσομπανοπούλου Παναγιώτα Επίκουρος Καθηγήτρια Βόλος, Ιούλιος 2014

ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΙΑΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ Η/Υ Αναζήτηση στο χώρο σύνθεσης και βελτιστοποίησης μνημών cache κάτω από τη διακύμανση παραμέτρων Μεταπτυχιακή Διατριβή Χαράλαμπος Γ. Αντωνιάδης Επιβλέποντες : Ευμορφόπουλος Νέστωρας Επίκουρος Καθηγητής Σταμούλης Γεώργιος Καθηγητής Τσομπανοπούλου Παναγιώτα Επίκουρος Καθηγήτρια Εγκρίθηκε από την τριμελή εξεταστική επιτροπή την 11 η Ιουλίου 2014......... Ν. Μπέλλας Π. Τσομπανοπούλου Χ. Δ. Αντωνόπουλος Αναπληρωτής Καθηγητής Επίκουρος Καθηγήτρια Επίκουρος Καθηγητής

Μεταπτυχιακή Διατριβή για την απόκτηση του Μεταπτυχιακού διπλώματος Ειδίκευσης «Επιστήμη και Τεχνολογία Υπολογιστών, Τηλεπικοινωνιών και Δικτύων» του Πανεπιστημίου Θεσσαλίας, στα πλαίσια του Προγράμματος Μεταπτυχιακών Σπουδών του Τμήματος Μηχανικών Η/Υ, Τηλεπικοινωνιών και Δικτύων του Πανεπιστημίου Θεσσαλίας... Αντωνιάδης Χαράλαμπος Διπλωματούχος Μηχανικός Ηλεκτρονικών Υπολογιστών, Τηλεπικοινωνιών και Δικτύων Πανεπιστημίου Θεσσαλίας Copyright Charalampos Antoniadis, 2014 Με επιφύλαξη παντός δικαιώματος. All rights reserved. Απαγορεύεται η αντιγραφή, αποθήκευση και διανομή της παρούσας εργασίας, εξ ολοκλήρου ή τμήματος αυτής, για εμπορικό σκοπό. Επιτρέπεται η ανατύπωση, αποθήκευση και διανομή για σκοπό μη κερδοσκοπικό, εκπαιδευτικής ή ερευνητικής φύσης, υπό την προϋπόθεση να αναφέρεται η πηγή προέλευσης και να διατηρείται το παρόν μήνυμα. Ερωτήματα που αφορούν τη χρήση της εργασίας για κερδοσκοπικό σκοπό πρέπει να απευθύνονται προς τον συγγραφέα.

µ = 6.74E 10 σ = 2.74E 12 I ds V ds α V gs = 1.1V µ = 1.94E 10 σ = 7.98E 13 µ = 7.2E 3 sd = 0.036E 3

µ = 6.6387E 9 σ = 5.7098E 12 µ = 1.5256E 9 σ = 5.3615E 11 µ = 9.9119E 10 σ = 2.5040E 12 µ = 7.0047E 10 σ = 3.6974E 12 µ = 4.9032E 9 σ = 2.6216E 10 µ = 1.7875E 9 σ = 1.1985E 12 µ = 2.6741E 9 σ = 2.6788E 12 µ = 5.71E 9 σ = 4.6116E 12

SRAM PDF EVT CMOS MOSFET FinFET CMP PF MC EMC IS LUT WID

I leak = µ eff C ox W L u2 t (1 e ( V dd u t ) ) e ( V th V off n u t ) µ eff C ox W /L u t V th V off V gs = 0 V ds = V dd

P leak inv = I leak pmos + I leak nmos V DD 2 I leak pmos I leak nmos P leak nand2 = I leak nmos SF nand2 V DD SF nand2 P leak = P leak network + P leak peripheral circuitry + P leak mem array P peripheral circuitry = (P leak decoder + P leak senseamps ) N banks N subbanks N mats in subbank

P leak mem cells = N subarr rows N subarr cols P mem cell P mem cell = V DD I cell leakage I cell leakage = (n N I N + n P I P )k design n N n P I N I P k degign

R e ff C drain C gate C gate (W ) = W L eff C gate + L poly L eff C polywire L eff W L poly C gate C polywire C drain (W ) = A D W C diffarea + P D C diffside + W C diffgate

A D = W L D P D = W + 2 L D C diffarea C diffside C diffgate C diffgate R V ds I ds

R eff = V DD I eff I eff = I H + I L 2 I H = I DS (V GS = V DD, V DS = V DD 2 ) I L = I DS (V GS = V DD 2, V DS = V DD ) x C eq R eq Vx Req Ceq [ ( uth )] 2 delay = t f + 2trise b(1 u th )/t f V dd V dd

u th t rise t f t f = R eq C eq b b = 0.5 t fall [ ( delay = t f 1 u )] 2 th 2t fall b u th + V dd V dd t f b = 4 u th1 u th2 [ ( uth )] 2 delay = t f + 2trise b(1 u th )/t f + V dd V [ ( dd uth1 ) ( uth2 ] t f V dd V dd [ ( delay = t f 1 u th )] 2 2t fall b u th + + V dd V dd t f ( 1 u ] th2 V dd [ ( t f 1 u ) th1 V dd T step = [ R 2 C 2 + (R 1 + R 2 )C 1 ] ( ustart u end ) u start u end u start >

u end u start V dd u end u start u pre u end u pre u sense delay = { V 2T DD V th step, if T m step 0.5 V DD V th m T step + V DD V th, if T 2m step > 0.5 V DD V th m R1 C1 C2 R2 T access = T request network + T mat + T reply netork T mat = MAX(T row dec path, T col dec path ) T row dec path = T row dec + T bitline + T senseamp T col dec path = T col dec

T cycle time = max(t row dec path + T wordline reset + T bl restore, T between buffers htree network, T col dec path ) R wire = ρ d (thickness barrier)(width 2barrier) d < 1 ρ C wire = ε 0 (2Kε horiz thickness spacing + 2ε width vert layerspacing ) + fringe(ε horiz, ε vert )

τ = ( 1 l R o(c 0 + C p ) + R ) o s C wire + R wire sc 0 + 0.5R wire C wire l C 0 C p R o l s C 0 C p R o R wire C wire L optimal = 2R o (C 0 + C p ) R wire C wire S optimal = Ro C wire R wire C 0 delay = 0.693 τ len len L optimal S optimal P leakage = 1 + β V DD I leak 2 β I leak W = W min.nmos S optimal

l Ro Rwire Cp Cwire C0 Repeater t delay = nand delay + inverter delay + driver delay nand inverter driver P leakage = 4 (P leak inv + P leak nand2) P leak inv P leak nand2

t driver = (R driver (C wire + C drain ) + R wire C wire /2 + (R driver + R wire ) C senseamp ) C wire R wire R driver C drain P leak = 4 P leak inv P leak inv

P δp = δp D2D + δp W ID = δp D2D + δp rand + δp sys V th L V th L V th L V th L eff V th L eff δv thrand δl eff rand δv thrand δl eff rand δv thrand δl eff rand σδv thrand δv th δl eff L eff = L eff nominal + δl eff V th = V thnominal + δv th I eff R eff I eff C g/diff R eff C g/dif τ N

delay α α = 1.3 I DS α α α = 1.3 I on

400 Bitline delay distribution Occurrences 350 300 250 200 150 100 50 0 6.7 6.75 6.8 6.85 6.9 6.95 7 Delay(s) x 10 10 µ = 6.74E 10 σ = 2.74E 12 0, if V gs V th W P I DS = c L eff P u (V gs V th ) α/2 V ds, if V ds < V d0 W L eff P c (V gs V th ) α, if V ds V d0 P c P u V d0 V d0 = P u (V gs V th ) α/2 P c P v V d0 α α α δv th δl eff

10 2 10 3 Ids(A) 10 4 alpha power Model Shockley Model HSpice 10 5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 Delay[s] I ds V ds α V gs = 1.1V 400 Max bitline delay distribution Occurrences 350 300 250 200 150 100 50 0 1.92 1.93 1.94 1.95 1.96 1.97 1.98 1.99 2 Delay(s) x 10 10 µ = 1.94E 10 σ = 7.98E 13 L eff = L eff nominal + δl eff V th = V thnominal + δv th 5.5 2

600 Total Leakage Power distribution 500 Occurrences 400 300 200 100 0 7.2 7.25 7.3 7.35 7.4 7.45 7.5 7.55 7.6 7.65 Power(W) x 10 3 µ = 7.2E 3 sd = 0.036E 3 dish random 0 σ = 6 N

Y =... f ( )C(h( )) x 1 x 2... x n x2 x n x 1 = (x 1, x 2,..., x n ) f ( ) = f X1 (x 1 )f X2 (x 2 )... f Xn (x n ) y = h( ) C(h) = 1 h 1 ( ) < 10 9 s h 2 ( ) < 10 9 W f o (h( )) f ( ) ( ) h( ) i = 1, 2..., N

Y Y N i=1 C(h( )) N f ( ) ( ) h( ) F (h( )) y c P [Y y c ] θf X (x) x = θf X (x) g(x) g(x) x = θfx (x) g(x) g(x) x = E g( θf X(x) g(x) ) θ... g ( ) := f 1 γ ( ) = f 1 γ X 1 (x 1 )f 1 γ X 2 (x 2 )... f 1 γ X n (x n ) f ( ) γ > 0 f o (h( )) g X (x) f X (x) f X (x) γ = 0 γ = 1

Y C(h( i ))f ( i ) γ fx ( ) γ f 1 γ ( ) f 0 f 0 g X (x) = f (1 γ) X (x) f X (x) N(µ, σ) g(x) = f(x, µ, σ) = 1 σ (x µ) 2 2π e 2σ 2 1 σ (1 γ) ( e 2π) (1 γ) (x µ) 2 (1 γ) 2σ 2 1 g(x) = e σ (1 γ) ( 2π) 2π γ (x µ) 2 (1 γ) σ 2γ ( 2π) 2γ 2σ 2 σ 2γ ( 2π) 2γ σ = σ1 γ ( 2π) γ g(x)

g(x) = 1 σ 2π e (x µ) 2 (1 γ) 2 σ 2γ ( 2π) 2γ 2σ 2 g(x) = 1 ((x µ)κ) 2 ) σ 2π e 2σ 2 κ = ( 1 γ) σ γ ( 2π) γ g(x) x = κx µ = κµ g(x) = 1 (x µ ) 2 ) σ 2π e 2σ 2 x x = x /κ N(µ, σ )

1 The natural sampling function 0.9 0.8 0.7 0.6 f(x) 0.5 0.4 0.3 0.2 0.1 0 5 0 5 X 1 The EMC sampling function 0.9 0.8 0.7 0.6 g(x) 0.5 0.4 0.3 0.2 0.1 0 5 0 5 X

i = 1... I P i = 1... I M i p j M I p j = (z 1,..., z I ) j = 1... M z i ε p j P P 0 + I (z i ) P 0 M M M mats.in.a.subbank cols.in.a.subarray rows.in.a.subarray i=1

P dyn () cost(arch) = min.dynamic.power en.for.dyn.power.opt + T access () min.delay en.for.delay.opt + P leak () min.leakage.power en.for.leak.power.opt + T cycle.time () min.cycle.time en.for.cycle.time.opt P dyn () (1 + a) min.dyn.power P leak () (1 + b) min.leak.power T access () (1 + c) min.delay T cycle.time () (1 + d) min.cycle.time = {N banks, N subbanks, N mats.in.subbank, N subarr.rows, N subarr.cols,...} P dyn () P leak () T access () T cycle.time ()

min.dynamic.power min.leakage.power min.delay min.cycle.time a b c d a b c, d en.for.dyn.power.opt en.for.delay.opt en.for.leak.power.opt en.for.cycle.time.opt P dyn () P leak () T access () T cycle.time () P dyn () P leak () T access () T cycle.time () max( ()) max( ()) max( ()) max(. ())

cost() = max( ()) min.dynamic.power en.for.dyn.power.opt + max( ()) en.f or.delay.opt + min.delay max( ()) min.leakage.power en.for.leak.power.opt + max(. ()) en.f or.cycle.time.opt + min.cycle.time max.y ield en.for.y ield.opt Y ield() max( ()) (1 + a) min.dyn.power max( ()) (1 + b) min.leak.power max( ()) (1 + c) min.delay max(. ()) (1 + d) min.cycle.time

350 Bitline delay distribution of cfg1 Occurrences 300 250 200 150 100 50 0 6.4 6.5 6.6 6.7 6.8 6.9 Delay[s] x 10 10 µ = 6.6387E 9 σ = 5.7098E 12

350 Bitline delay distribution of cfg2 Occurrences 300 250 200 150 100 50 0 1.3 1.4 1.5 1.6 1.7 1.8 Delay[s] x 10 9 µ = 1.5256E 9 σ = 5.3615E 11 400 decoder+wordline delay distribution of cfg1 Occurrences 350 300 250 200 150 100 50 0 9.8 9.85 9.9 9.95 10 10.05 10.1 10.15 Delay[s] x 10 10 µ = 9.9119E 10 σ = 2.5040E 12 350 Decoder+wordline delay distribution of cfg2 Occurrences 300 250 200 150 100 50 0 6.85 6.9 6.95 7 7.05 7.1 7.15 7.2 Delay[s] x 10 10 µ = 7.0047E 10 σ = 3.6974E 12

500 htree network delay distribution of cfg3 400 Occurrences 300 200 100 0 2 2.5 3 3.5 4 4.5 5 5.5 6 6.5 Delay[s] x 10 9 µ = 4.9032E 9 σ = 2.6216E 10 120 Total access time distribution of cfg1 100 Occurrences 80 60 40 20 0 1.784 1.785 1.786 1.787 1.788 1.789 1.79 1.791 1.792 1.793 1.794 Delay[s] x 10 9 µ = 1.7875E 9 σ = 1.1985E 12

80 Total access time distribution of cfg2 Occurrences 70 60 50 40 30 20 10 0 2.666 2.668 2.67 2.672 2.674 2.676 2.678 2.68 2.682 2.684 2.686 Delay[s] x 10 9 µ = 2.6741E 9 σ = 2.6788E 12 100 Total access time distribution of cfg3 80 Occurrences 60 40 20 0 5.695 5.7 5.705 5.71 5.715 5.72 5.725 5.73 Delay[s] x 10 9 µ = 5.71E 9 σ = 4.6116E 12