Original Paper Espresso Semantic Drift in Espresso-style Bootstrapping: Graph-theoretic Analysis and Evaluation in Word Sense Disambiguation 233 Mamoru Komachi Taku Kudo Masashi Shimbo Yuji Matsumoto Nara Institute of Science and Technology mamoru-k@is.naist.jp, http://cl.naist.jp/ mamoru-k/ Google Inc. taku@google.com Nara Institute of Science and Technology shimbo@is.naist.jp, http://cl.naist.jp/ shimbo/ matsu@is.naist.jp, http://cl.naist.jp/staff/matsu/home.html keywords: Bootstrapping, Link Analysis, HITS, Regularized Laplacian, von Neumann Kernel, Word Sense Disambiguation, Semi-supervised Learning Summary Bootstrapping has a tendency, called semantic drift, to select instances unrelated to the seed instances as the iteration proceeds. We demonstrate the semantic drift of Espresso-style bootstrapping has the same root as the topic drift of Kleinberg s HITS, using a simplified graph-based reformulation of bootstrapping. We confirm that two graphbased algorithms, the von Neumann kernels and the regularized Laplacian, can reduce the effect of semantic drift in the task of word sense disambiguation (WSD) on Senseval-3 English Lexical Sample Task. Proposed algorithms achieve superior performance to Espresso and previous graph-based WSD methods, even though the proposed algorithms have less parameters and are easy to calibrate. 1. 1 [Yarowsky 95, Abney 04] :is-a : (cat, animal) : Y such as X 1 [Yarowsky 95] [Collins 99] [Hearst 92, Riloff 99, Pantel 06] [Curran 07]
234 25 2 A 2010 Espresso [Pantel 06] Espresso Espresso 2 3 Simplified Espresso Espresso Simplified Espresso Espresso Simplified Espresso HITS [Kleinberg 99] HITS Simplified Espresso Espresso Simplified Espresso Espresso Simplified Espresso 3 3 4 3 2 5 Senseval-3 Lexical Sample Task 6 2. 2 1 [Hearst 92] 3 is-a [Yarowsky 95] [Abney 04] Yarowsky 7 [Curran 07] 2 2 2 [Yarowsky 95, Abney 04, Pantel 06, Curran 07] is-a (X is-a Y) (X, Y) (cat, animal) (cat, animal) Y such as X
Espresso 235 (sparrow, bird) 1. 2. 1. (a) (b) Espresso (b) 2 1 (i) (ii) 2 (iii) (iv) (v) (i) (v) [Ng 03] 2 3 Espresso Espresso [Pantel 06] Espresso Espresso p i r π (p) r ι (i) : r π (p) = 1 pmi(i, p) I max pmi r ι(i) (1) i I r ι (i) = 1 pmi(i, p) P max pmi r π(p) (2) p P pmi(i,p) = log 2 i,p i,,p (3) 2 P I P I i, i,p p i,p p i max pmi pmi 3 Espresso (1) 2 2 (i) (2) (iii) Espresso (1) (2) 3. Espresso 3 1 Simplified Espresso 1 P (i,p) 2 log 2 Google P (i)p (p) pmi pmi 3
236 25 2 A 2010 : i 0 : M : i : p 1: i = i 0 2: repeat 3: p Mi 4: p 5: i M T p 6: i 7: until i p 8: return i p 1 Espresso I P i 0 M i 0 1 0 I M P I (p,i)- [M] pi p i 2 8 i p i p Espresso (1) (2) 3 6 [M] pi = pmi(i,p) maxpmi, (4) 4 6 p p/ I and i i/ P (5) p i Espresso M (4) 4 6 (5) 1 Simplified Espresso Simplified Espresso 3 2 Simplified Espresso n 2 10 (4) (5) 3 6 n i n = A n i 0 (6) A = 1 I P M T M (7) A A i n n i n A A = 1 I P M T M A M HITS [Kleinberg 99] 4 (1) (2) HITS [Pantel 06] A i 0 Simplified Espresso HITS [Bharat 98] HITS Simplified Espresso Espresso [Yarowsky 95, Riloff 99] 3 3 Espresso Espresso Simplified Espresso i i (1) M 5 (2) i Simplified Espresso Espresso (3) k 5 4 i n HITS Simplified Espresso A 5 k = 3
Espresso 237 (4) k i i (3) (4) (3), (4) Espresso, Simplified Espresso k [Cover 76] 6 k [Ng 97] (2) Senseval-3 Lexical Sample (S3LS) 7 bank 5 S3LS 394 bank bank 10 Espresso 100 0 100 200 8 0 3 Espresso [Pantel 06] [Pantel 06] 1 Simplified Espresso Espresso (1) (2) 6 Support Vector Machines 7 http://www.senseval.org/senseval3/data.html 8 2 Simplified Espresso Espresso [Pantel 06] Espresso 2 Simplified Espresso Espresso X Y = / Simplified Espresso 9 3 2 M HITS M T M Espresso 20 0.773 0.10 Espresso 3 = 3 Espresso 7 Espresso 2
238 25 2 A 2010 3 Espresso 4. 2 3 Espresso HITS 2 2 HITS von Neumann HITS von Neumann A M A = M T M λ A β (0 β < λ 1 ) von Neumann K β : K β = A β n A n = A(I βa) 1. (8) n=0 2 i,j K β (i,j) i- i (8) A n (= (M T M) n ) (i,j) A(= M T M) i j n n = 1 (M T M) n (M T M) n Kleinberg HITS β β HITS β 5 3 4 1 von Neumann [Kandola 02] von Neumann [Ito 05] von Neumann von Neumann HITS Espresso HITS HITS HITS 4 2 von Neumann β [Smola 03, Chebotarev 98] von Neumann von Neumann
Espresso 239 β [Ito 05] G A G L L = I D 1/2 AD 1/2 (9), D i [D] ii i [A] ij A (i,j) [D] ii = j [A] ij (8) A L A R β = β n ( L) n = (I + βl) 1 (10) n=0 β( 0) von Neumann A n = (M T M) n ( L) n = (D 1/2 AD 1/2 I) n ) von Neumann A D 1/2 AD 1/2 5. von Neumann S3LS M Simplified Espresso (4) 2 bag-of-words : S3LS bag-of-words 1 1 i w 1 Porter Stemmer 9 : interest sale of * interest in * ** sale of their interest in Mandarin Oriental 9 http://tartarus.org/ martin/porterstemmer/def.txt 1 bank Simplified Espresso 100.0 0.0 Espresso 100.0 30.2 Espresso 94.4 67.4 von Neumann (β = 10 5 ) 92.1 65.1 (β = 10 2 ) 92.1 62.8 ±3 β von Neumann 10 5 10 2 10 5 1 1: von Neumann 3 3 bank 3 3 3 3 3 1 Espresso Simplified Espresso 3 3 3 Espresso bank 7 ; 2 Espresso Espresso Simplified Espresso Espresso 5 2 2: bank S3LS 10 4 von Neumann 5 3 von Neumann
240 25 2 A 2010 2 54.5 55.2 HyperLex [Véronis 04] 64.6 PageRank [Agirre 06] 64.5 Simplified Espresso 44.1 42.8 Espresso 46.9 59.1 Espresso 66.5 63.6 von Neumann (β = 10 5 ) 67.2 64.9 (β = 10 2 ) 67.1 65.4 2 Espresso (, Simplified Espresso Espresso 20 7 Espresso [Agirre 06] HyperLex[Véronis 04] PageRank[Brin 98] 2 2 von Neumann Espresso 1 Espresso Agirre HyperLex PageRank 2.5 7 Espresso, 20, 4 Simplified Espresso Espresso 4 5 β S3LS WSD β von Neumann β S3LS WSD β 3 3 Espresso 3 3 Simplified Espresso Simplified Espresso 10 Simplified Espresso A 11 5 3 3: β β S3LS 11 [Ito 05] M M
Espresso 241 4 5 von Neumann 4 β HITS =Simplified Espresso 10 5 < β < 10 4 5 β 5 4 (HITS Simplified Espresso) [Li 08] self-training 2 A 2 5 3 von Neumann β von Neumann β A 1/λ β β [Ito 05] [Kunegis 09] von Neumann 6. Espresso HITS von Neumann Senseval-3 Lexical Sample 1 [Curran 07] [Vyas 09] [Kondor 02, Nadler 06, Saerens 04] [Saerens 04] [Fouss 07] co-training [Blum 98] co-training [Abney 04] Abney, S.: Understanding the Yarowsky Algorithm, Computational Linguistics, Vol. 30, No. 3, pp. 365 395 (2004) [Agirre 06] Agirre, E., Martínez, D., Lacalle, de O. L., and Soroa, A.: Two graph-based algorithms for state-of-the-art WSD, in Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 585 593 (2006) [Bharat 98] Bharat, K. and Henzinger, M. R.: Improved Algorithms for Topic Distillation in a Hyperlinked Environment, in Proceedings of the 21st ACM SIGIR Conference, pp. 104 111 (1998) [Blum 98] Blum, A. and Mitchell, T.: Combining Labeled and Unlabeled Data with Co-Training, in Proceedings of the Workshop on Computational Learning Theory (COLT), pp. 92 100 (1998)
242 25 2 A 2010 [Brin 98] Brin, S. and Page, L.: The Anatomy of a large-scale hypertextual Web search engine, Computer Networks and ISDN Systems, Vol. 30, No. 1 7, pp. 107 117 (1998) [Chebotarev 98] Chebotarev, P. Y. and Shamis, E. V.: On Proximity Measures for Graph Vertices, Automation and Remote Control, Vol. 59, No. 10, pp. 1443 1459 (1998) [Collins 99] Collins, M. and Singer, Y.: Unsupervised Models for Named Entity Classification, in Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pp. 100 110 (1999) [Cover 76] Cover, T. M. and Hart, P. E.: Nearest neighbor pattern classification, IEEE Transactions on Information Theory, Vol. 13, pp. 21 27 (1976) [Curran 07] Curran, J. R., Murphy, T., and Scholz, B.: Minimising semantic drift with Mutual Exclusion Bootstrapping, in Proceedings of the 10th Conference of the Pacific Association for Computational Linguistics, pp. 172 180 (2007) [Fouss 07] Fouss, F., Yen, L., Dupont, P., and Saerens, M.: Random- Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation, IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 3, pp. 355 369 (2007) [Hearst 92] Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora, in Proceedings of the Fourteenth International Conference on Computational Linguistics, pp. 539 545 (1992) [Ito 05] Ito, T., Shimbo, M., Kudo, T., and Matsumoto, Y.: Application of Kernels to Link Analysis, in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 586 592 (2005) [Kandola 02] Kandola, J., Shawe-Taylor, J., and Cristianini, N.: Learning Semantic Similarity, in Advances in Neural Information Processing Systems 15, pp. 657 664 (2002) [Kleinberg 99] Kleinberg, J.: Authoritative Sources in a Hyperlinked Environment, Journal of the ACM, Vol. 46, No. 5, pp. 604 632 (1999) [Kondor 02] Kondor, R. I. and Lafferty, J.: Diffusion Kernels on Graphs and Other Discrete Input Spaces, in Proceedings of the 19th International Conference on Machine Learning (ICML-2002), pp. 315 322 (2002) [Kunegis 09] Kunegis, J. and Lommatzsch, A.: Learning Spectral Graph Transformations for Link Prediction, in Proceedings of the 26th Annual International Conference on Machine Learning, pp. 561 568 (2009) [Li 08] Li, X., Wang, Y.-Y., and Acero, A.: Learning Query Intent from Regularized Click Graphs, in Proceedings of SIGIR 08: the 31st Annual ACM SIGIR conference on Research and Development in Information Retrieval, pp. 339 346 (2008) [Nadler 06] Nadler, B., Lafon, S., Coifman, R., and Kevrekidis, I.: Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker- Planck Operators, Advances in Neural Information Processing Systems 18, pp. 955 962 (2006) [Ng 97] Ng, H. T.: Exempler-Based Word Sense Disambiguation: Some Recent Improvements, in Proceedings of the Second Conference on Empirical Methods in Natural Language Processing, pp. 208 213 (1997) [Ng 03] Ng, V. and Cardie, C.: Weakly Supervised Natural Language Learning Without Redundant Views, in Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 94 101 (2003) [Pantel 06] Pantel, P. and Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations, in Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 113 120 (2006) [Riloff 99] Riloff, E. and Jones, R.: Learning Dictionaries for Information Extraction by Multi-Level Bootstrapping, in Proceedings of the Sixteenth National Conference on Artificial Intellligence (AAAI- 99), pp. 474 479 (1999) [Saerens 04] Saerens, M., Fouss, F., Yen, L., and Dupont, P.: The Principal Component Analysis of a Graph, and its Relationship to Spectral Clustering, in Proceedings of European Conference on Machine Learning (ECML 2004), pp. 371 383, Springer (2004) [Smola 03] Smola, A. J. and Kondor, R. I.: Kernels and Regularization of Graphs, in Proceedings of the 16th Annual Conference on Learning Theory, pp. 144 158 (2003) [Véronis 04] Véronis, J.: HyperLex: Lexical Cartography for Information Retrieval, Computer Speech & Language, Vol. 18, No. 3, pp. 223 252 (2004) [Vyas 09] Vyas, V., Pantel, P., and Crestan, E.: Helping Editors Choose Better Seed Sets for Entity Set Expansion, in Proceedings of ACM Conference on Information and Knowledge Management (CIKM-2009), pp. 225 234 (2009) [Yarowsky 95] Yarowsky, D.: Unsupervised Word Sense Disambiguation Rivaling Supervised Methods, in Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189 196 (1995) 2009 7 6 2005 2007 14 ACL 1976 1999 2001 2004 NTT 1992 1994 1997 ACL ACM 1977 1979 1984 85 1985 87 ( ) 1993 AAAI ACL ACM