CMOS Technology for Computer Architects Iakovos Mavroidis Giorgos Passas Manolis Katevenis Lecture 13: On chip SRAM Technology FORTH ICS / EURECCA & UoC GREECE
ABC A A E F A BCDAECF A AB C DE ABCDAECF A A DC D A AECF D A FAF A A A A A AECF C ABCDAECF D A FA D F A EC C F DA CDA A EEA C AB A D EECF AD A ED E FA EDC E A E FA E FA D A A D A ABCD A F EC C A
Sense Amplifiers role & consequences Sense amplifiers significantly speed up read access time sense 0-contents soon after bit-line discharge has started Sense amplifiers (SA) are large in size can fit only one SA per 8 columns (sometimes per 4 columns?) analog multiplexors before SA select columns to be read digital multiplexors after SA needed for narrow port widths result in large blocks being slower when port is too narrow Sense amplifiers consume significant energy when activated only activate the block when read data are actually needed power consumption is proportional to access frequency power consumption is proportional to number of amplifiers (increases with port width, or with bit capacity of SRAM) 2
area 2 mm /Mbit, 1-port 2 mm /Mbit, 2-port 6 6 4 4 2 2 0 2 8 32 128 512 block capacity (Kbits) 0 2 8 32 128 512 block capacity (Kbits) 16-bit 32-bit 64-bit 128-bit 3
comments on area Slightly old (2009); values are µm 2 /bit or mm 2 /Mbit Large blocks are more area efficient than small ones peripheral overhead (decoders, muxes, sense amplifiers) amortized over a larger core Port width costs a lot for small blocks more sense amplifiers large blocks need many SA s in any case Two-port area about 2 larger than one-port area large bit cell Two-port blocks: both ports are rd/wr not one wr- & one rd-port 4
power uw/mhz (typical), 1-port uw/mhz (typical), 2-port 60 50 40 30 20 10 60 50 40 30 20 10 4 16 64 256 block capacity (Kbits) 4 16 64 256 block capacity (Kbits) 16-bit 32-bit 64-bit 128-bit 5
comments on power Slightly old (2009); values are µw /M Hz Typical-case consumption: 1.2V, 25 C all cycles active, all address and data bits switching Consumption increases with block size due to increasing wordline and bit-line capacitance Consumption is dominated by port-width actually by the num. of SA s 2-port consumption is per-port 2-port consumption 2 1-port consumption 6
cycle time Cycle Time (ns) - worst case, 1-port 3.6 3.2 2.8 2.4 2.0 4 16 64 256 block capacity (Kbits) 16-bit 32-bit 64-bit 128-bit 7
comments on cycle time Slightly old (2009); values are ns Worst-case cycle time: 1.08V, 125 C blocks compiled for power Small is Fast: Small blocks are faster than large blocks bit-line (and word-line) capacitance increases with length For large blocks, narrow ports increase the read latency extra multiplexors after the sense amplifiers 8
65nm CMOS On-Chip SRAM Buffer Example 512-bits-wide example Width = 1 min-size packet = few cache lines=512 bits= = 4 blocks 128 bits/block One-port = 2048 packets 128 bits/pack = 256 Kb Area = 4 banks 128 Kb/bank 0.76 mm 2 /Mb = 0.4mm 2 Throughput = 512 bits 280 Maccesses/s = 145 Gb/s Power Consumption = 4 banks 53 µw/mhz 280 MHz = 60mW 9
Next Lecture 14 Iakovos