Automatic generation of Network-on-Chip topology under link length and latency constraint

THE INSTITUTE OF ELECTRONICS, INFORMATION AND COMMUNICATION ENGINEERS TECHNICAL REPORT OF IEICE.,, 113 8656 7 3 1 113 0032 2 11 16 CREST E-mail: {tanida,hiroaki,matsumoto}@cad.t.u-tokyo.ac.jp,fujita@ee.t.u-tokyo.ac.jp SoC NoC NoC NoC [5] NoC Automatic generation of Network-on-Chip topology under link length and latency constraint Hideo TANIDA, Hiroaki YOSHIDA,,TakeshiMATSUMOTO, and Masahiro FUJITA, Dept. of Electrical Engineering and Information Systems, The University of Tokyo 7 3 1 Hongo, Bunkyo-ku, Tokyo, 113 8656 Japan VLSI Design and Education Center, The University of Tokyo 2 11 16 Yayoi, Bunkyo-ku, Tokyo, 113 0032 Japan CREST, Japan Science and Technology Agency E-mail: {tanida,hiroaki,matsumoto}@cad.t.u-tokyo.ac.jp,fujita@ee.t.u-tokyo.ac.jp Abstract With wire delay becoming dominant compared to transistor delay in deep-submicron era, the performance of SoC is more affected by interconnect. Although many NoC (Network-on-Chip) architectures which improve interconnect performance are proposed, automatically finding the most efficient one for a given application and mapping the function blocks onto it, is still an open issue. This paper proposes a method for generating a custom NoC which meets communication link-length and latency requirements. Additional constraint for floor-planning and interconnect architecture generation, to existing integer-linear-programming-based approach [5], enables link-length and latency requirement to be met in the generated NoC architecture. Key words Network-on-Chip, integer linear programming, guaranteed performance, floor planning 1. SoC System-on-a- Chip 1

NoC Network-on-Chip/ [2] SoC 2 NoC NoC SoC NoC 3 4 NoC 5 6 2. Network-on-Chip NoC 1 NoC CPU DSP, Memory NoC 1 1 NoC NoC FPGA NoC 2. 1 NoC NoC 1 NoC [1] 2 NoC a SPIN, b CLICHË 2-D mesh, c Torus, d Folded torus, e Octagon, f BFT [4] 2 [4] :SPIN,Octagon NoC 2

3 Communication trace graph [5] 6 [5] dist(u, v) u, v Ψ l ω(e),σ(e) X max,y max α, β X max Y max X max + Y max 4 [5] 5 4 NoC [5] 3. Network-on-Chip [5] 3. 1 3 CTG: communication trace graph W i,h i 3. 2 3 CTG 4 α[ (u,v) E dist(u, v) Ψ l ω(e) σ 2 (e) ]+β[xmax + Y max ] (1) 1 1 [5] v i X i,min,y i,min W i,h ix i,max,y i,max dist(u, v) i X i,min,y i,min v i,v j V X i,min > = X j,max,x j,min > = X i,max, Y i,min > = Y j,max,y j,min > = Y i,max (2) X i,max < = X max,y i,max < = Y max (3) 2 3. 3 3 CTG 5 6 bounding box 2 [5] 3

r i i p i,j r i j 0 < = j<ν, ν NR k,i,j v k p i,j 1 0 RR i,j,k,l p i,j,p k,l 1 0 O i,j,k,l v i v j p k,l 1 0 I i,j,k,l v i v j p k,l 1 0 BO k,l BO k,l = ω(e m) O i,j,k,l (4) BI k,l e m =v i,v j E BI k,l = ω(e m) I i,j,k,l (5) Z i,j,k,l,m,n e m =v i,v j E v i,v j p k,l p k,l,p m,n 1 0 Z i,j,k,l,m,n = O i,j,k,l RR k,l,m,n (6) O i,j,k,l + RR k,l,m,n > = 2 Z i,j,k,l,m,n (7) O i,j,k,l + RR k,l,m,n < = Z i,j,k,l,m,n +1 (8) (P R + P L ) (9) P R,P L P R P R =Ψ i BI i,j +Ψ o BO i,j (10) r i R p i,j r i R p i,j Ψ i, Ψ o P L P L =Ψ L ( i,j,k,l,m ω(i, j) RD k,m Z i,j,k,l,m,n + i,j,k,l ND i,k ω(i, j) NR i,k,l + i,j,k,l ND j,k ω(i, j) NR j,k,l ) (11) Ψ L RD k,m k, m ND i,k 3 v s,v d p = {(v s,r a ), (r a,r b ),..., (r z,v d )} (12) 5 4. NoC 4. 1 3 NoC Intel TeraFLOPS Processor [6] 3 [5] 4

4GHz 4. 2 NoC 4. 2. 1 1 NoC a 1 0.5 b 2 1 d 4 1 e 2 1 f 4 1 2 NoC a e 3000 6 d e 3000 6 d f 3000 6 b d 3000 6 e f 3000 6 b a 10 1 b e 3000 6 f e 3000 6 (u, v) E dist(u, v) < = D max σ(e u,v) (13) D max σ(e u,v ) u, v σ(e u,v ) 4. 2. 2 3. 3 1,0 RR i,j,k,l,nr i,j,k D max r p p i,j, v k NR k,i,j = 0 (14) D max r n,r n p m,i,p n,j RR m,i,n,j = 0 (15) D max 5. NoC ILP lp solve 5.5.0.14 [3] CPU Intel Xeon X5470 3.33GHz NoC 1 2 b a 1 7 5. 1 b,d,e,f 1 ILP 4.75 4.68 7 8 1 b,a 1.5 0 1 b,a 5. 2 7 10 Ψ i, Ψ o 328,66,11 Ψ L 80 [5] 5

8 b f e e d SW8 SW0 SW0 a f b a d 9 10 ILP 824.21 1800 9 a,b 0 d,e,f 8 10 3, 4 9 a,b 0 d,e,f 8 0-8 6. NoC 3 a e a SW0 e d e d SW0 e d f d SW0 f b d b SW0 d e f e SW0 f b a b SW0 a b e b SW0 e f e f SW0 e 4 a e a SW0 e d e d SW8 e d f d SW8 f b d b SW0 SW8 d e f e SW8 f b a b SW0 a b e b SW0 SW8 e f e f SW8 e NoC 10 NoC [1] M. Dall Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini. xpipes: a Latency Insensitive Parameterized Network-on-chip Architecture For Multi-Processor SoCs. Proceedings of the 21st International Conference on Computer Design, pp. 536 539, 2003. [2] W.J. Dally and B. Towles. Route packets, not wires: on-chip interconnection networks. Proceedings of the 38th Design Automation Conference (DAC), pp. 684 689, 2001. [3] lp solve. http://lpsolve.sourceforge.net/5.5/. [4] P.P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures. IEEE Transactions on Computers, pp. 1025 1040, 2005. [5] Krishnan Srinivasan, Karam S. Chatha, and Goran Konjevod. Linear-programming-based techniques for synthesis of network-on-chip architectures. IEEE Trans. Very Large Scale Integr. Syst., Vol. 14, No. 4, pp. 407 420, 2006. [6] S.R. Vangal, J. Howard, G. Ruhl, S. Dighe, H. Wilson, J. Tschanz, D. Finan, A. Singh, T. Jacob, S. Jain, V. Erraguntla, C. Roberts, Y. Hoskote, N. Borkar, and S. Borkar. An 80-tile sub-100-w teraflops processor in 65-nm cmos. IEEE Journal of Solid-State Circuits, Vol. 43, No. 1, pp. 29 41, Jan. 2008. 6