DEWS2007 D3-6 y yy y y y y yy / DC 7313194 341 E-mail: yfktamura,mori,kuroki,kitakamig@its.hiroshima-cu.ac.jp, yymakoto@db.its.hiroshima-cu.ac.jp Newman Newman Newman Newman Newman A Clustering Algorithm of a Undirected Graph by the Modularity using Tabu Search Keiichi TAMURA y, Makoto TAKAKI yy,yasuma MORI y, Susumu KUROKI y, and Hajime KITAKAMI y y Facultyof Infomation Sciences, Hiroshima CityUniversity yy Graduate School of Information Sciences, Hiroshima CityUniversity/JSPS Research Fellow Ozuka-Higashi 123, Asa-Minami-ku, Hiroshima, 7313194 Japan E-mail: yfktamura,mori,kuroki,kitakamig@its.hiroshima-cu.ac.jp, yymakoto@db.its.hiroshima-cu.ac.jp Abstract As a technique of dividing vertex set of a undirected graph into a dense structure called a cluster,newman has proposed a clustering algorithm using the modularity(the Newman's algorithm). The Newman's algorithm is a greedy algorithm and can obtain a clustering at high speed. However, a greedy algorithm may fall into a partial optimal solution, and a good clustering may not be obtained. This paper propsed a clustering algotihm of a undirected graph by the modularity using the tabu search. By using tabu search, it is expected that the proposed algorithm can obtain a high-precision clustering rather than the Newman's algorithm. In order to evaluate the proposed algorithm, the experiments used a network data and a graph data of the trackback data. As the experimental results, the proposed algorithm obtained the clustering in which accuracy is high compared with the Newman's algorithm. This paper explains the proposed algorithm and reports the experimental results. Key words Clustering, Graph Data, Optimization 1. VLSI [1][3]
1 1 1 3 Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3 4 5 6 1 2. Newman Newman 2. 1 G = (V; E)V:E: i C i 2 V(1 < = i < = n) C C = fc 1 ; C 2 ;:::;C n g (C i ρ V) (1) V = n[ i=1 C i C i C j = ffi (i, j) F F (G; C) F (G; C) C max F (G; C) (2) C2S 2S G C n jvj n jv j 2. 2 1 1 3 1 Newman Q Q
[8] Q e ij e ii Q = X i (e ii a 2 i ) (3) i = a i = X j m e ij i j = m Q (e ii a 2 ) Q i Q i 1 Q 1 4 2 Q 1 = 4=14 81=28 2 2 Q 2 = 3=14 64=28 2 3 Q 3 = 4=14 100=28 2 Q = 11=14 245=28 2 2. 3 Newman Newman Q Newman 2m Algorithm 1 Newman :G(V,E) :C 1: C := ffi; 2: for all v 2 V do 3: C := C fvg; /* 1 */ 4: end for 5: while (1) do 6: Q := ;; /* Q jcj jcj */ 7: Q := CALC DQ(G; C); /* CALC DQ(G; C) C i j Q Q ij */ 8: fmax dq; i; jg := GET MAX DELTA Q( Q); /* GET MAX DELTA Q Q ij max dq fi; jg */ 9: if max dq > 0 then 10: C := RECLUSTERING(C; i; j); 11: else /* RECLUSTERING(C; i; j) C C i C j 1 */ 12: return C; /* */ 13: end if 14: end while CALC DQ 2 i j Q ij GET MAX DELTA Q Q ij Q ij 1 5 Q ij Q Q ij Q ij = 2(e ij a i a j ) (4) 1 1 3 Q 13 = 2(1=14 9=28 9=28) < 0 1 3 Q 1 2 2 3 Q 13 Q 23 0 1 Q 2. 4 Newman Newman Newman 2- Newman 2- Q 2-2- 17 18 19 20 1 i v i v 1 v 2 v 3 v 4 1 v 5 v 6 v 7 v 8 2 v 9 v 10 v 11 v 12 3 v 13 v 14 v 15 v 16 4 v 17 v 18 v 19 v 20 v 17 v 18 v 19 v 20 Q v 17 v 18 v 19 v 20 4 Q Newman 2- Newman 3. Newman 1
2 Newman 3. 1 C N(C) C(2 N(C)) N(C) C C C C N(C) C C C 3. 2 1 C C i v C j 3(a) 3(b) 3(c) 3(b) 1 v 2 3(c) 3(b) 1 jcj C i v C j Q ij (v) ψ Q ij (v) = e ij (v) + a j a i 2 m k v 2m! 2 (5) e ij (v) = C i C j v 2m k v : v 3. 3 Algorithm 2 TABU CLUSTERING :G(V; E) :C 1: C := INIT(G); /* INIT(G) */ 2: T := ffi; 3: Q := Q(G; C); /* Q(G; C) Q */ 4: C best := C; Q best := Q; /* */ 5: while () do 6: f C; dq; v; i; jg := GET BEST NEIGHBOR(G; C; T ); /* GET BEST NEIGHBOR(G; C; T ) Q C dq v i j Algorithm3 */ 7: C := C; Q := Q + dq; /* */ 8: if dq < = 0 then 9: if Q > = Q best then 10: C best := C; Q best := Q; /* */ 11: end if 12: else 13: UPDATE TABULIST(T; v; i; j); /* */ 14: end if 15: end while 16: return C best ; Algorithm 3 GET BEST NEIGHBOR :G(V; E), C, T : C; max dq; v; i; j 1: for all c 2 C do 2: c v Q dq T 3: end for 4: Q dq C 5: Cmax dq v i j INIT GET BEST NEIGHBOR C
i v (a) (b) (c) 3 n n Q dq C C C 3. 4 1 1 3. 5 1 4. Newman [9] mixi [10] [11] Newman Newman [12] Newman Newman Newman [1] 1 Newman 1 3 [11] Newman [11] [13] Newman Q ij Q 2 2 2 [14] 1 Newman
1 1 16 17 2 16 30 3 20 28 4 28 30 5 332 2125 1997 6 4432 28733 2006 6 7 3147 18986 2006 7 8 3951 23966 2006 8 9 2284 10760 2006 9 (a) 1 (b) 2 4 1,2 5. Newman (a) Newman (b) 5 3 5. 1 1 9 Newman 1 2 3 4 Newman 5 1997 678 9 Newman 1 5. 2 CPU:PentiumD 2.8GHz, Memory:2Gbyte, Disk:250GB 5000 5. 3 1 1 5 4 2 Newman Q 0.565744 0.548889 5 6 3 4 Newman 5. 4 2 4 2 (a) Newman (b) 6 4 Q 2 Q 7 Q Q (e ii a 2 ) i Q 103 1 Newman Q 8 7 2 Newman 1 2 1 8 1 Q 0.143487 2 2 Q 0.178874 2 1 2 1 2 2 Newman 5. 5 3 6 9
7 2 5 Newman 7 0.320392 10 0.350094 Q 5 Q 3 6 Q Newman 44 0.579466 166 0.602565 4 7 Q Newman 44 0.515584 151 0.562549 5 8 Q Newman 49 0.58815 201 0.614505 6 9 Q Newman 52 0.636598 124 0.638274 8 5 3 4 5 6 Newman Q 9 10 11 12 Q Q Newman Q Q Q 9 Newman Q Newman Q 9 200 3 Newman Q 10 Newman Newman 600 Q 6 Newman 4 8 9 12 19 Newman 8 9121315 23 8 tf-idf 5 tf-idf 12 15 13 23 9 Newman 8 5 13(a) 8 13(b) 8 13(c) 9 12 19 6. Newman
(a) 8 (b) (c) 13 8 9 Q 6 12 Q 9 10 11 Q 7 Q 8 NTCIR SA C17500097 18700094 [1] E. Hartuv and R. Shamir: A clustering algorithm based on graph connectivity, Information Processing Letters, 76, 46, pp. 175181 (2000). [2] M. Brinkmeier: Communities in graphs., IICS, pp. 2035 (2003). [3] U. Brandes, M. Gaertler and D. Wagner: Experiments on graph clustering algorithms (2003). [4] M. E. J. Newman: Fast algorithm for detecting community structure in networks, Physical Review E, 69, p. 066133 (2004). [5] A. Clauset, M. E. J. Newman and C. Moore: Finding community structure in very large networks, Physical Review E, 70, p. 066111 (2004). [6] T. H. Cormen, C. E. Leiserson and R. L. Rivest: Introduction to Algorithms, MIT Press/McGraw-Hill (1990). [7] F. Glover and F. Laguna: Tabu Search, Kluwer Academic Publishers, Norwell, MA, USA (1997). [8] M. E. J. Newman: Modularity and community structure in networks, PROC.NATL.ACAD.SCI.USA, 103, p. 8577 (2006). [9],,,, 47, 3, pp. 865874 (2006). [10],.Altafm-Amm,,,,,,, No.2006-BIO-005, pp. 915 (2006). [11], Www,, No.2006-ICS- 142, pp. 115122 (2006). [12] G. Flake, S. Lawrence and C. L. Giles: Efficient identi cation of web communities, Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Boston, MA, pp. 150 160 (2000). [13],, 7 Web, pp. 109114 (2006). [14],,,,, No.2006-BIO-005, pp. 5964 (2006).