An Efficient Calculation of Set Expansion using Zero-Suppressed Binary Decision Diagrams

22 27 2 SP-C 2012 2011 Short Paper ZDD An Eicient Calculation o Set Expansion using Zero-Suppressed Binary Decision Diagrams Masaaki Nishino Norihito Yasuda Toru Kobayashi NTT Cyber Solutions Laboratories, NTT Corporation nishino.masaaki@lab.ntt.co.jp yasuda.n@lab.ntt.co.jp kobayashi.toru@lab.ntt.co.jp keywords: set expansion, ZDD, matrix multiplication Summary Set Expansion is a method or inding similar sets o items rom a seed set. It is useul on examining a large set o items to ind hidden relationships among the items. In this paper, we propose a method that can reduce the number o calculations needed on executing Bayesian Sets, which is one o the most popular set expansion algorithms. The key point o our method lies in the use o Zero-suppressed Binary Decision Diagrams (ZDD) to express a binary value sparse matrix in a compressed orm and executing needed calculations directly on constructed ZDD. We show a method or expressing a binary value matrix with ZDD, and also show some techniques or reducing the size o ZDD. We conirm the eectiveness o our method with experiments on both synthesis and real data. 1. (Set Expansion) D (seed) D elephant, horse, bat [Wang 07, Ghahramani 06, Vickrey 10] D D D Bayesian Sets[Ghahramani 06] (ZDD: Zero-suppressed Binary Decision Diagrams)[Minato 93] ZDD ZDD ZDD ZDD 2 3 Bayesian Sets ZDD 4 ZDD 5 ZDD 6 7 2. Bayesian Sets Bayesian Sets Ghahramani D

ZDD 23 x D ( ) D c D c D Bayesian Sets D c x D score(x) score(x)= p(x D c) p(x) x i D c Bernoulli M x i =(x i1,x i2,...,x im ) x ij = {0,1} M θ( θ i [0,1]) p(x i θ) p(x i θ)= M (1) θ x ij j (1 θ j ) 1 x ij (2) θ Bernoulli Beta p(θ α, β)= M Γ(α j + β j ) Γ(α j )Γ(β j ) θα j 1 j (1 θ j ) βj 1 (3) α β Beta (1) score(x) M ( ) ( ) 1 x j α j + β x j j αj βj score(x)= (4) α j + β j + K K D c α j = α j + K i=1 x ij, β j = β j + K K i=1 x ij (4) M log score(x)=c + q j x j (5) x c,q j M c = log (α j + β j ) α j β j log (α j + β j + K)+log β j log β j (6) q j =log α j log α j log β j +logβ j (7) D N M X s = c + Xq (8) s N c X X q D c D Bayesian Sets Xq 1 x x g b c F a b 0 1 {ab, bc} ZDD x (a) 3. ZDD 2 Share g x ZDD 0 (b) Jump ZDD[Minato 93] (BDD: Binary Decision Diagrams) [Bryant 86] ZDD 0-1- ZDD 0-1- 2 a,b,c {ab,bc} ZDD 1 F ZDD ZDD ZDD ZDD ZDD 2 2 2 ZDD ZDD ZDD 1-0- 3 ZDD B() ZDD 3 B() k v k,1-0- h k,l k

24 27 2 SP-C 2012 1 0 0 0 1 1 0 1 0 F r 1 r 2 r 3 e 1 e 2 e 3 0 1 Input: q, X ZDD X Output: s 1: or k {1,2} do 2: value(k) 0 /**/ 3: or k =3to B( X ) do 4: i v k {e 1,...,e M } then 5: value(k) value(l k )+value(h k )+q η(vk ) 6: else 7: s η(vk ) value(h k ) /*v k {r 1,...,r N }*/ 8: return s 3 ZDD 4 ZDD 4. ZDD ZDD ZDD ZDD ZDD D M N D N M X X S S = {r 1,...,r N,e 1,...,e M } r 1,...,r N e 1,...,e M M X Z X N Z X = {r i e ai (1)...e ai (b i )} (9) i=1 a i (j) i x i x i1,...,x im 1 j b i x i1,...,x im 1 1 a i (1) < < a i (b i ) M 3(a) {r 1 e 1 e 3,r 2 e 3,r 3 e 2 } ZDD 3(b) r 1 <r 2 < <r N <e 1 <e 2 < <e M ZDD ZDD N + N i=1 b i r 1,...,r N 1 2 e 1,...,e N ZDD 3 1 r 1 e 1 e 3 2 r 2 e 3 e 3 ZDD 4 ZDD X ZDD M q value η(v k ) η(e 3 )=3,η(r 1 )=1 2 value 3 7 value 3 5 v k e 1,...,e M value k value(k) k q η(v k ) ZDD 5 7 ZDD ZDD ZDD 3(b) ZDD q = {1,1,1} e 3,e 2,e 1 value 1,1,2 s = {2,1,1} 5. 5 1 4 ZDD

ZDD 25 1 1 1 1 0 1 0 1 1 0 0 1 1 0 0 0 1 0 F r 1 r 2 r 3 e 1 e 1 e 4 e 5 e 2 e 2 e 3 1 e 6 r i L L 6 L 5 m m 2 m m ZDD 1 L (9) M/L N Z X = {r i e aij (1)...e aij (b ij )} (10) i=1 a ij (k) x i (j 1)L +1 min(jl,m) 1 k b ij x i (j 1)L +1 min(jl,m) 1 5 3 6 L =3 ZDD 1 e 6 e 3 2 3 3 ZDD 4 L 5(b) 1,2,3 1 1 2 e 2 e 3 r 2 1- e 2 4,5,6 0- e 2 0-0- e 1,...,e M 0-1 5(b) 0-0- 0-5 2 ZDD ZDD 4 ZDD e 1,...,e M N M X i j 1 i<j M X (i,j) M q i j q (i,j) X (i,j) q (i,j) Xq i j e 1,...e M e 1 < <e i < <e j < <e M e 1 < < e j < <e i < <e M X X (i,j) e 1,...,e M [Iwasaki 07, Loekito 07] ZDD (1) (2) (1) 2 1 5 1 6 Y (i) Y i N Y (i) =(Y (1) 1,Y (1) 2,...,Y (1) N )

26 27 2 SP-C 2012 Input: N M X L Output: X N M Y 1: Y X 2: or i {1,..., M/L } do 3: r {(i 1)L +1,...,M} 4: or j {(i 1)L +1,...,M} do 5: dist j N (i) Y k=1 k Y (j) k 6: Y ((i 1)L+1),...,Y (M) dist j L 7: Y (i 1)L + il L 8: return Y 7 6 6. 6 1 1 Bernoulli p(x ij ρ)=ρ x ij (1 ρ) 1 x ij 10000 10000 [Ghahramani 06] NIPS NIPS(Neural Inormation Processing Systems) [Ghahramani 06] 2037 13649 2037 ZDD Scala 90 NIPS 20 C GCC(4.4.5) -O2 2.4GHz Intel Xeon CPU 32GB Linux( 2.6.32) 6 2 ZDD 7 L 1 ZDD 10% 70% 8 1 NIPS.615.540.532(.003) ZDD 1 L L NIPS 8 L =30 60% NIPS 0.04 L 7 ρ =0.1 ρ =0.01 L L 6 3 5 2 NIPS 2 ZDD L L =3 10

ZDD 27 2 ( ) ( ) ρ ZDD CRS 0.1.479 22.1 15.9 0.3.307 37.0 39.1 0.5.199 42.1 60.7 0.8.118 40.0 98.0 ZDD VSOP 1 10 0.075 2 0.008 6 4 Bayesian Sets 3 1000 Bayesian Sets 1000 (Compressed Row Storage, CRS) ρ 0.1,0.3,0.5,0.8 10000 10000 2 ZDD L ρ =0.1 CRS 7. [Bryant 86] Bryant, R. E.: Graph-Based Algorithms or Boolean Function Manipulation, IEEE Trans. Comput, Vol. C-35, No. 8, pp. 677 691 (1986) [Ghahramani 06] Ghahramani, Z. and Heller, K. A.: Bayesian Sets, in Proc. o NIPS 06 (2006) [Iwasaki 07] Iwasaki, H., Minato, S., and Zeugmann, T.: A Method o Variable Ordering or Zero-suppressed Binary Decision Diagrams in Data Mining Applications, in Proc. o the 3rd IEEE International Workshop on Databases or Next-Generation Researchers, pp. 85 90 (2007) [Loekito 07] Loekito, E. and Bailey, J.: Are Zero-suppresed Binary Decision Diagrams Good or Mining Frequent Patterns in High Dimensional Datasets?, in Proc. o 6th Australasian Data Mining Conerence, pp. 135 146 (2007) [Minato 93] Minato, S.: Zero-Suppressed BDDs or Set Manipulation in Combinatiorial Problems, in Proc. o DAC 93, pp. 272 277 (1993) [Vickrey 10] Vickrey, D., Kipersztok, O., and Koller, D.: An Active Learning Approach to Finding Related Terms, in Proc. o ACL 10, pp. 371 376 (2010) [Wang 07] Wang, R. C. and Cohen, W. W.: Language-Independent Set Expansion o Named Entities using the Web, in Proc. o ICDM 07, pp. 342 350 (2007) 2011 8 20 2006 2008 ( ) NTT 1997 1999 ( ) 2009. NTT,, () Bayesian Sets ZDD ZDD ZDD 1985 1987 NTT NTT IEEE