Minimization of Redundant CORDIC Area Architectures Pipeline Rostock, Germany 1 A.Wassatsch, S.Dolling, D.Timmermann Austin,Texas(USA) October 5, 1998
Rostock, Germany 2 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application for CORDIC Benets of area reduction Conclusion
Rostock, Germany fast, delay independent of { but very chip area wordlength, similarity of result generation to { algorithms (MSD- digit-oline investigation of behavior of { digits in redundant transfer 3 Motivation normalized area delay product 1 0.8 0.6 0.4 0.2 0 sample bit adder RCA CSKA CSEL CSUM CLA CSA RBA 10 20 30 40 50 60 redundant arithmetic consuming observation rst) adder arrays area-delay product vs. wordlength
Rostock, Germany 4 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application to CORDIC Benets of area reduction Conclusion
trigonometric = 1) (m linear = 0) (m hyperbolic =,1) (m Rostock, Germany x n = x 0 x n = x 0 y n = x 0 z 0 + y 0 z n = z 0 + y 0 =x 0 5 Introduction to CORDIC algorithm rotation (z n! 0) vectoring (y n! 0) x n = k 1 (x 0 cos(z 0 ), y 0 sin(z 0 )) x n = k 1 px 2 0 + y2 0 y n = k 1 (y 0 cos(z 0 )+x 0 sin(z 0 )) z n = z 0 + tan,1 (y 0 =x 0 ) p = k,1(x 0 cosh(z 0 )+y 0 sinh(z 0 )) x n = k,1 x 2 0, y2 0 n x y n = k,1(y 0 cosh(z 0 )+x 0 sinh(z 0 )) z n = z 0 + tanh,1 (y 0 =x 0 )
z i+1 = z i, i m;i Rostock, Germany = xnk,1 m x = ynk,1 m y m;i : rotation angle i : rotation direction 1+m 2 p m;i i p ( mm;i ) tan,1 p m 6 Introduction to CORDIC algorithm (cont') iteration: scaling: x i+1 = x i, m i 2,S(m;i) y i y i+1 = y i + i 2,S(m;i) x i with Y (, ) Xi+1 Yi+1 Y n,1 Y n,1 k m,i 1 k m;i = km = ( X i,y i ) i=0 i=0 a i Xn,1 Xn,1 m = i m;i = X i=0 i=0
of add&shift for CORDIC pipeline principle not shown) (z-datapath Rostock, Germany based on add & shift operations between two of tion three datapaths the 7 Introduction to CORDIC algorithm (cont') iteration 0 X Y shift 0 -> iteration 1 < σ > 0 add/sub register shift 1 -> iteration 2 < σ > 1 build a regular array shift 2 -> < σ > 2 intense communica-
Rostock, Germany 8 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application to CORDIC Benets of area reduction Conclusion
integration of scaling into the iteration; optimization of the { scaling operation [Schmidt, et al., 1986] special { i estimation [Takagi et al., 1991], [Lee and Lang, 1992] reducing the number of iteration repetitions { et al., 1992] [Timmermann booth recoding of i [Timmermann et al., 1992], { et al., 1996] [Antelo Rostock, Germany 9 Previous approaches area reduction by algorithmic modications
nonredundant architectures { and Sundsb, 1992] [Timmermann Rostock, Germany 10 Previous approaches (cont') area reduction on bit-level X s i Y s i P i V i { redundant-zero adder utilizes increasing shifts X s i P i V i X d i Y d i S i s S i d X i d S i s S d i P i-1 V i-1 P i-1 V i-1 Redundant adder-cell (RR)! Redundant zero adder-cell ()
Rostock, Germany 11 0 1 2 3 4 5 6 x 4, 7 Previous approaches (cont') - 4 2 y4,0-4 2 y4,1-4 2 y4, 2-4 2 y 4,3 s 4 Trunc. Trunc. RR RR RR RR - 5 2 y5,0-5 2 y5,1-5 2 y5,2 s 5 Trunc. Trunc. RR RR RR - 6 2 y 6,0-6 2 y6,1 s 6 Trunc. Trunc. RR RR x7,0 x7, 1 x7, 2 x7, 3 x7, 4 x7, 5 x7, 6 x 7, 7 Reducing x-datapath using -cells
Rostock, Germany 12 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application to CORDIC Benets of area reduction Conclusion
A 13!,14 S 14,13 4 6= 0 with recoding 11! 01, 11! 01, 101! 011, 101! 011 s Rostock, Germany 13 add 0 0 0 CAB 0 add X 0000000 1111111 111 A 000000000 1 CAB 0 0 0 0 X 0 0000000 1111111 111 S A 000000000 1 add CAB 0 0 0 0 0 X 0 RR RR 0000000 1111111 111 S 000000000 1 RR shift >=1 shift >=1 shift >=1 00000000 00000000 00000000 00000000 00000000 00000000 000000000 000000000 00000000 00000000 00000000 00000000 S = A + B + c in s i ;a i 2f1;0;1g i =0 b in 2 f1; 0; 1g c i = 0; 1; 2; 3 A closer look at redundant addition f(a 3 ;a 2 ;a 1 );(a 2 ;a 1 ;a 0 )g6=f(111); (111)g results in s 4 = 0 and f(s 3 ;s 2 ;s 1 );(s 2 ;s 1 ;s 0 )g6=f(111); (111)g
s4 s3 s2 s1 s0 s4 s3 s2 s1 s0 Rostock, Germany s4 s3 s2 s1 s0 14 Redundant addition of leading zero's a0;n,1 a0;n,2 a0;n,3 a0;n,4 a0;n,5 a0;n,6 a0;n,7 + 0 0 0 0 b0;n,1 b0;n,2 b0;n,3 a1;n,1 a1;n,2 a1;n,3 a1;n,4 a1;n,5 a1;n,6 a1;n,7 + 0 0 0 0 b1;n,1 b1;n,2 a1;n,1 a2;n,2 a2;n,3 a2;n,4 a2;n,5 a2;n,6 a2;n,7 + 0 0 0 0 b2;n,1 a1;n,1 a2;n,2 a3;n,3 a3;n,4 a3;n,5 a3;n,6 a3;n,7 ( bold face = fixed values)
suppression of pseudooverows Rostock, Germany absorbs any possible stops the ow of carry, 15 New cells for area reduction X i s P i V i X s i P i transfer digits P and V V i d X i-1 d X i S s i S d i X d i S i s S i d P i-1 V i-1 P i-1 V i-1 Redundant zero 0 cell (0) Carry absorber cell (CAB)
Rostock, Germany 16 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application to CORDIC Benets of area reduction Conclusion
reduction depends on operation mode, dierent implementation each mode for due to CORDIC-specic double iteration delayed start of method reduction { CAB-0-- { CAB-REC Rostock, Germany 17 reduction application: Area rotation mode CORDIC two alternative implementation possible advantage: starts one iteration before disadvantage: small increase in computing time (technology dependent)
18 Chip area reduction rotation mode CORDIC 000000000 0000000000 0000000000 0000000000 0000000000 0 0 0 0 00 00 00 00 000 000 000 000 0000 0000 0000 0000 00000 00000 00000 00000 1 11 11 11 11 111 111 111 111 1111 1111 1111 1111 1 1 1 1 11 11 11 11 111 111 111 111 1111 1111 1111 1111 11111 11111 11111 11111 000000000 0000000000 0000000000 0000000000 0000000000 0 0 0 0 00 00 00 00 000 000 000 000 0000 0000 0000 0000 00000 00000 00000 00000 1 11 11 11 11 111 111 111 111 1111 1111 1111 1111 1 1 1 1 11 11 11 11 111 111 111 111 1111 1111 1111 1111 11111 11111 11111 11111 0000000000 0 0 0 0 00 00 00 00 00 00 00 11 1111 1111 1111 1 1 1 1 11 11 11 11 11 11 11 X 0 Y 0 Z 0 Y n X n 00000 00000 11111 11111 logic (RR) logic(cab,0,) memory(only register) saved logic area σ 0.4n Rostock, Germany
Rostock, Germany 19 reduced Area datapath 1 2 3 4 x 4, n 5 x - 4, 6 7 8-4 - 4-4 - 4 2 y4, n - 1 2 y4, n - 2 2 y 4, n - 3 2 y 4, n - 4 s 4 CAB 0 RR RR RR RR - 5-5 - 5 2 y5, 1 2 y5, n - 2 2 y5, n - 3 s 5 CAB 0 RR RR RR - 6-6 2 y6, n - 1 2 y6, 2 s 6 CAB 0 RR RR x7, 1 x7, 2 x7, n - 3 x7, n - 4 x7, n - 5 x7, n - 6 x7, n - 7 x7, n - 8 Novel area reduced x-datapath by using special adder cells
Rostock, Germany 20 1 2 3 4 5 6 7 8 Area reduced datapath, second version - 4-4 - 4-4 2 y4, n - 1 2 y4, n - 2 2 y 4, n - 3 2 y 4, n - 4 s 4 CAB REC RR RR RR RR - 5-5 - 5 2 y5, 1 2 y5, n - 2 2 y5, 3 s 5 CAB REC RR RR RR - 6-6 2 y6, n - 1 2 y6, 2 s 6 CAB REC RR RR x7, n - 1 x7, n - 2 x7, n - 3 x7, n - 4 x7, n - 5 x7, n - 6 x7, n - 7 x7, n - 8 Novel area reduced x-datapath by using 3 digit adder cells
Rostock, Germany x i+1 = x i, m i 2,2S(m;i) y i z i+1 = z i, i m;i 21 Area reduction for CORDIC vectoring mode modied iteration y i+1 = 2(y i + i x i ) larger hardware savings due to larger right shift in x-datapath only registers for x required after iteration i dn=2e y resembles to the situation in the z-path for rotation mode only a small strip of special cells in the z-path after dn=3e
Rostock, Germany 22 X 0 Y 0 Z 0 Chip area reduction vectoring mode CORDIC 0.5n 00000 11111 00000 11111 0000 1111 000 111 000 111 00 11 00 11 0 1 00000 11111 00000 11111 00000 11111 00000 11111 0000 1111 0000 1111 0000 1111 0000 1111 000 111 000 111 000 111 000 111 00 11 00 11 00 11 00 11 0 1 0 1 0 1 0 1 σ 1111 1111 1111 1111 111 111 111 111 0000000000 11 0000000000 11 0000000000 11 0000000000 11 000000000 1 000000000 1 000000000 1 000000000 1 00000000 00000000 00000000 00000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 000000 111111 000000 111111 000000 111111 000000 111111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 logic (RR) 0.4n logic(cab,0,) X n memory(only register) saved logic area Z n
Rostock, Germany 23 comparison of redundant CORDIC Area architectures relation area effort to standard redundant 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 full redundant redundant zero add&shift 10 20 30 40 50 60 wordlength n relation area effort to standard redundant 1.2 1.1 1 0.9 0.8 0.7 0.6 0.5 full redundant redundant zero add&shift 10 20 30 40 50 60 wordlength n rotation mode vectoring mode
Rostock, Germany 24 Outline Motivation Introduction to CORDIC algorithm Previous approaches Area reduction method for add&shift algorithms Application to CORDIC Benets of area reduction Conclusion
Rostock, Germany static power decreases proportional { with cell area dynamic power decreases as { well chip area results in smaller wire length with re- shorter capacity load for standarduced cell layouts 25 Benets of area reduction 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 estimated normalized power consumption full redundant redundant zero add&shift 10 20 30 40 50 60 wordlength n power minimization speed improvment
Area reduction method is applicable in general add&shift by successively adder cells savings architectures With optimized full custom cells for CAB, 0, the results be improved can Rostock, Germany 26 Conclusion Up to 40% area savings possible Results checked by sample synthesized layouts
of Slides List Outline 2 Motivation 3 Outline 4 Introduction to CORDIC algorithm 5 Introduction to CORDIC algorithm (cont') 6 Introduction to CORDIC algorithm (cont') 7 Outline 8 Previous approaches 9 Previous approaches (cont') 10 Previous approaches (cont') 11 Outline 12 A closer look at redundant addition 13 Redundant addition of leading zero's 14 New cells for area reduction 15 Outline 16 Area reduction application: 17 rotation mode CORDIC Chip area reduction rotation mode CORDIC 18 Area reduced datapath 19 Area reduced datapath, second version 20 Area reduction for CORDIC vectoring mode 21 Chip area reduction vectoring mode CORDIC 22 Area comparison of redundant CORDIC architectures 23 Outline 24 Benefits of area reduction 25 26-1 26 Conclusion
26-2 References et al., 1996] Antelo, E., Brugera, J., and Zapata, E. (1996). Unified mixed radix 2-4 redundant cordic [Antelo processor. IEEE Trans. on Computers, 45(9):1068{1073. [Lee and Lang, 1992] Lee, J.-A. and Lang, T. (1992). Constant-factor redundant cordic for angle calculation and rotation. IEEE Trans. on Computers, 41(8):1016{1025. [Schmidt, et al., 1986] Schmidt, et al. (1986). Parameter optimization of the cordic-algorithm and implementation in a cmos-chip. In Proc. EUSICO-86, B. 2, pages 1219{1222, Hague, Netherlands. [Takagi et al., 1991] Takagi, N., Asada, T., and Yajima, S. (1991). Redundant cordic methods with a constant scale factor for sine and cosine computation. IEEE Trans. on Computers, 40(9):989{995. [Timmermann et al., 1992] Timmermann, D., Hahn, H., and Hosticka, B. (1992). Low latency time cordic algorithms. IEEE Trans. on Computers, 41(8):1010{1015. [Timmermann and Sundsb, 1992] Timmermann, D. and Sundsb, I. (1992). Area and latency efficient cordic architectures. In Proc. ISCAS'92, pages 1093{1096, San Diego.