DEIM Forum 2011 D3-1 Random walk with restart Top-k, 230 047 1-1 230 047 1-1 263 505 4-6-1 E-mail: {fujiwara.yasuhiro,nakatsuji.makoto,onizuka.makoto}@lab.ntt.co.jp, kitsure@tkl.iis.u-tokyo.ac.jp Random walk with restart (RWR) RWR K Random walk with restart Top-k Efficient Top-k Search for Random Walk with Restart Yasuhiro FUJIWARA,, Makoto NAKATSUJI, Makoto ONIZUKA, and Masaru KITSUREGAWA NTT Cyber Space Laboratories, 1-1 Hikarinooka, Yokosuka, Kanagawa, 230 047 Japan NTT Cyber Solution Laboratories, 1-1 Hikarinooka, Yokosuka, Kanagawa, 230 047 Japan Institute of Industrial Science, The University of Tokyo, Komaba 4-6-1, Meguro, Tokyo 263 505 Japan E-mail: {fujiwara.yasuhiro,nakatsuji.makoto,onizuka.makoto}@lab.ntt.co.jp, kitsure@tkl.iis.u-tokyo.ac.jp 1. [1], [2], [3] Random walk with restart (RWR) RWR [4] RWR q q RWR u q q u RWR [5], [6], [7], [] [], [9] q [] q K q K
2. 3. RWR 4. 5. 6. 2. RWR Pan [5] RWR 10% [10] Konstas RWR [6] Konstas Sun RWR RWR [] RWR RWR 0 Tong RWR B LIN NB LIN [9] RWR NB LIN RWR Sun RWR Tong O(n 2 ) O(n 2 ) 3. Random walk with restart RWR 1 RWR RWR q [4] c p n 1 p u u 1 q K n m c p u n 1 q q 1 0 n 1 A q n 1 q q 1 0 A A u,v u v p = (1 c)ap + cq (1) RWR p p u u q t RWR O(mt) RWR RWR [6] 4. 4. 1 4. 2 4. 3 4. 4 4. 1 3. (1) O(n 2 ) LU
[11] K RWR O(1) 4. 2 4. 2. 1 (1) p = c{i (1 c)a} 1 q = cw 1 q (2) I W = I (1 c)a W 1 W [12] W LU W = LU p = cu 1 L 1 q (3) L 1 U 1 L 1 U 1 L U L 1 U 1 [12] >< L 1 ij = >: U 1 ij = >< >: 0 (i < j) 1/L ij (i = j) 1/L ii P i 1 k=j L ikl 1 kj (i > j) 0 (i > j) 1/U ij (i = j) 1/U ii P j k=i+1 U iku 1 kj (i < j) L U W [12] >< 0 (i < j) L ij = 1 (i = j) >: 1/U jj W ij P j 1 k=1 L iku kj (i > j) >< 0 (i > j) U ij = W ij (i < = j i = 1) >: W ij P i 1 k=1 L iku kj (i < = j i = 1) (4), (5), (6), (7) L 1, U 1, L, U L 1 ij L L 1 L ij W, L, U (4) (5) (6) (7) (1) L 1 ij U 1 ij L U 0 0 (2) L U W 0 0 (3) W A 0 0 A 0 L 1 ij U 1 ij A 0 Newman clustering [13] κ Newman clustering κ+1 1 κ κ+1 A 1 κ κ+1
(1) (2) (3) 1. 1 A 0 [11] 4. 3 4. 3. 1 0 1 i i V V s u l u l u V (l u) V (l u) = {v : (v V s) (l v = l u)} A A max A max = max{a ij : i, j V } u A max(u) A max(u) = max{a iu : i V } A max A max(u) 4. 3. 2 u p u 1 q u p u < p u = c : X v V (l u 1) p va max(v) + X v V (l u) + 1 X p va max(v) v V s p v! A max ) () c = (1 c)/(1 A uu + ca uu) u p u = 1 1 O(n) V (l u 1) V (l u) V s O(n) O(1) 4. 3. 3 1 u p u > = p u 2 l u < = l v u v p u > = p v 2 4. 3. 3 4. 3. 2 1 O(n) u u u u p u,1 p u,2 p u,3 () p u = c ( p u,1 + p u,2 + p u,3) u 2 u ( pu p u,1 =,1 if l(u) = l(u ) p u,2 + p u A max(u ) otherwise ( pu p u,2 =,2 + p u A max(u ) if l(u) = l(u ) (9) 0 otherwise p u,3 = ( p u,3/a max p u ) A max u p u,1 = p qa max(q) p u,2 = 0 p u,3 = (1 p q)a max(u) 3 2 u
Algorithm 1 Input: q, K, L 1, L U 1, U Output: V a, 1: θ = 0; 2: V s = ; 3: V a = ; 4: K V a ; 5: q ; 6: while V s = V do 7: u := argmin(l v v V \V s); : u p u ; 9: if p u < θ then 10: return V a; 11: else 12: L 1 U 1 p u ; 13: if p u > θ then 14: v := argmin(p w w V a); 15: v V a ; 16: u V a ; 17: θ := min(p w w V a); 1: end if 19: end if 20: u V s ; 21: end while 22: return V a; RWR O(1) 4. 4 1 K θ V a K 0 θ 1 θ 2 θ V a θ 5. Tong NB LIN [9] NB LIN 3. Sun [] Tong B LIN Tong c 0.95 [9], [14] Wall clock time [s] 10 0 10-1 10-2 10-3 10-4 10-5 10-6 Proposed(5) Proposed(25) Proposed(50) NB_LIN(100) NB_LIN(1,000) 2 Dictionary Internet Citation Dictionary 1 : FOLDOC 2 u v u v 13, 356 120, 23 Internet 3 : Oregon Route Views Project 4 BGP 22, 963 4, 436 Citation 5 : Condensed Matter E-Print 6 31, 163 120, 029 CPU Intel Xeon Quad-Core 3.33GHz 32GB Linux GCC 5. 1 NB LIN 2 K Propased(K) NB LIN 100 NB LIN(100) 1, 000 NB LIN(1,000) NB LIN K NB LIN 5. 2 1 http://vlado.fmf.uni-lj.si/pub/networks/data/dic/foldoc/foldoc.zip 2 http://foldoc.org/ 3 http://www-personal.umich.edu/ mejn/netdata/as-22july06.zip 4 http://routeviews.org/ 5 http://www-personal.umich.edu/ mejn/netdata/cond-mat-2003.zip 6 http://arxiv.org/archive/cond-mat
Precision 3 Number of non-zero elements 10 9 10 10 7 10 6 10 5 5 1 0. 0.6 0.4 0.2 0 100 400 700 1000 Target rank of SVD NB_LIN Proposed Degree Cluster Hybrid Random Dictionary Internet Citation Wall clock time [s] 4 Wall clock time [s] 10-1 10-2 10-3 10-4 10-5 10-6 10-2 10-3 10-4 10-5 10-6 6 100 400 700 1000 Target rank of SVD NB_LIN Proposed Proposed Without pruning Dictionary Internet Citation 3 4 NB LIN NB LIN Dictionary 3 1 NB LIN 4 NB LIN NB LIN 5. 3 5. 3. 1 5 Degree Clustering Hybrid Random O(m) 5. 3. 2 4. 3 6 Without pruning 1, 020 5. 4 NB LIN OS Dictionary NB LIN 1, 000 2 Microsoft Windows Microsoft Windows W2K Windows/36 Windows 3.0 Windows 3.11 Microsoft OS Microsoft OS Mac OS Apple Macintosh user interface Apple PC GUI Macintosh file system Mac OS Linux Linux Linux Documentation Project NB LIN 6. RWR [1] Y. Koren, S. C. North and C. Volinsky: Measuring and extracting proximity in networks, KDD, pp. 245 255 (2006). [2] H. Tong, C. Faloutsos and Y. Koren: Fast direction-aware proximity for graph mining, KDD, pp. 747 756 (2007). [3] D. Lizorkin, P. Velikhov, M. N. Grinev and D. Turdakov: Accuracy estimate and optimization techniques for simrank computation, PVLDB, 1, 1, pp. 422 433 (200). [4] H. Tong and C. Faloutsos: Center-piece subgraphs: problem definition and fast solutions, KDD, pp. 404 413 (2006). [5] J.-Y. Pan, H.-J. Yang, C. Faloutsos and P. Duygulu: Automatic multimedia cross-modal correlation discovery, KDD, pp. 653 65 (2004). [6] I. Konstas, V. Stathopoulos and J. M. Jose: On social networks and collaborative recommendation, SIGIR, pp. 195 202 (2009).
2 NB LIN Microsoft Windows, Mac OS, Linux. Microsoft Windows Mac OS Linux 1 2 3 4 5 Proposed Microsoft Windows W2K Windows/36 Windows 3.0 Windows 3.11 NB LIN Microsoft Windows Microsoft Networking Proposed Mac OS Macintosh user interface Microsoft Network W2K Thumb Macintosh file system multitasking Proposed Linux Linux Documentation Project NB LIN Linux Linux Documentation Project NB LIN Mac OS Rhapsody SORCERER Macintosh Operating System Macintosh Operating System PowerOpen Association Unix lint Linux Network Administrators Guide SL5 debianize SLANG [7] D. Liben-Nowell and J. M. Kleinberg: The link prediction problem for social networks, CIKM, pp. 556 559 (2003). [] J. Sun, H. Qu, D. Chakrabarti and C. Faloutsos: Neighborhood formation and anomaly detection in bipartite graphs, ICDM, pp. 41 425 (2005). [9] H. Tong, C. Faloutsos and J.-Y. Pan: Fast random walk with restart and its applications, ICDM, pp. 613 622 (2006). [10] J. L. Herlocker, J. A. Konstan, A. Borchers and J. Riedl: An algorithmic framework for performing collaborative filtering, SIGIR, pp. 230 237 (1999). [11] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein: Introduction to Algorithms, The MIT Press (2009). [12] W. H. Press, S. A. Teukolsky, W. T. Vetterling and B. P. Flannery: Numerical Recipes 3rd Edition, Cambridge University Press (2007). [13] A. Clauset, M. E. J. Newman and C. Moore: Finding community structure in very large networks, Physical Review E, pp. 1 6 (2004). [14] J. He, M. Li, H. Zhang, H. Tong and C. Zhang: Manifoldranking based image retrieval, ACM Multimedia, pp. 9 16 (2004).