DEIM Forum 2014 A8-1, 606 8501 E-mail: {tsukuda,ohshima,kato,tanaka}@dl.kuis.kyoto-u.ac.jp 1 2,, 1. Google 1 Yahoo 2 Bing 3 Web Web BM25 [1] HITS [2] PageRank [3] Web 1 [4] 1http://www.google.com 2http://www.yahoo.com 3http://www.bing.com 4 4 5 5 O = {t o1, t o2,, t on } O t a O t a t oi O t a
[5] [6] [7] [8] [9] [10] [11] 5 25 2. 3. 4. 5. 6. 7. 2. 2. 1 [12] [13] [14] [15]Noda [15] Wikipedia 4 (Wikipedia 1 2) 3 Wikipedia 2 1 Wikipedia Nadamoto [14] SNS 1 Wikipedia Wikipedia Liu [12] 2 Web 4http://ja.wikipedia.org/ Web Web Web Mejova [13] 1 2. 2 WebJaccard WebDiceWebOverlap NGDWWebPMI Web Bollegala [6] 2 [7] 1 Web 1 [8] Wikipedia [9] [10] [11]Strube [9] Wikipedia 2 2 Gabrilovich [10] Wikipedia Milne [11] 2 Wikipedia Wikipedia Wikipedia Wikipedia 2 Milne Strube Gabrilovich Gabrilovich 3. (t 1, t 2) t 1 t 2 c c T c = {t o1, t o2,, t on, } t a t oi T c t a t oi T c c T c t a t oi T c t a 1.
3. 1 1 3. 2 2 4. 3. 2 4. 1 4. 2 4. 3 4. 4 4. 1 3 4. 1. 1 WLM 1 Milne [11] t 1 t 2 Wikipedia t 1 t 2 t 1 t 2 TF-IDF s t s t ( ) W w(s t) = log if s B t, 0 otherwise. (1) B t B t t W Wikipedia t 1 t 2 1 s t 1 t 2 t 1 t 2 F t1 F t2 t F t1 F t2 t 1 t 2 sim forward (t 1, t 2) t 1 t 2 sim backward (t 1, t 2 ) = log (max ( B t 1, B t2 )) log ( B t1 B t2 ). log ( W ) log (min ( B t1, B t2 )) (2) 2 t 1 t 2 wlm(t 1, t 2 ) wlm(t 1, t 2) = 1 2 sim forward(t 1, t 2) + 1 2 sim backward(t 1, t 2). 4. 1. 2 WebPMI (3) 2 Web 2 WebPMI [6] [5] WebPMI t 1 t 2 0 if hit(t 1, t 2 ) < = c web pmi(t 1, t 2 ) = ( ) hit(t log 1 t 2 )/N 2 otherwise. (hit(t 1 )/N) (hit(t 2 )/N) hit(t) t Web Clue Web 09 5 Bollegala [6] c = 5 N Web Clue Web 09 5http://lemurproject.org/clueweb09/ (4)
N = 67, 337, 717 WebJaccard WebDiceWebOverlap NGD WebPMI [5], [6] 4. 1. 3 Noda 3 Web 2 [16] t 1 t 2 noda(t 1, t 2 ) = hit(t1 t2)2 hit(t 1 ) hit(t 2 ). (5) WebPMI 2 4. 2 3. 1 1 t oi t a rel pop(t oi, t a ) rel pop(t oi, t a ) = rel(t oi, t a ) pop(t oi ) (6) rel(t oi, t a) 4. 1 pop(t oi ) Wikipedia t oi t oi Wikipedia [17] 4. 3 4. 3. 1 t oi t oi [18] [19] ALAGIN 6 Wikipedia 20 245 t oi t oi 1 HITS [2] t oi t oi t oj coordinate(t oi, t oj ) 4. 3. 2 3. 2 2 6https://alaginrc.nict.go.jp/hyponymy/index.html biased PageRank [20] biased PageRank biased PageRank 1 2 1 2 2 2 Biased PageRank T c = {t o1, t o2,, t on } (i, j) coordinate(t oj, t oi ) n 1 (i, j) a i,j a i,j = coordinate(t oj, t oi ) t ok T c coordinate(t oj, t ok ). (7) t oi t a rel crd(t oi, t a) rel crd l+1 (t oi, t a ) =α 1< = j< = n + (1 α) a i,j rel crd l (t oi, t a ) rel(t oi, t a ) t oj T c rel(t oj, t a). 8 1 2 biased PageRank 12α 0 < = α < = 1 α α (8) 4. 4 t oi t a biased PageRank 4. 2 rel pop crd l+1 (t oi, t a ) =α 5. 1< = j< = n + (1 α) a i,j rel pop crd l (t oi, t a ) rel pop (t oi, t a ) t oj T c rel pop(t oj, t a). (9) 2 1
1 169 117 C 155 481 157 MVP 2 40 39 13 30 32 2 5. 1 5 5 4. 1. 1 WLM 4. 3. 1 Wikipedia 1 Wikipedia 2008 6 24 Wikipedia 1 W 1, 342, 098 5. 2 4. 7 Wikipedia 40 1 10 10 5 5 2 1 2 7http://www.lancers.jp/ 1 Web 1 1 20 20 5. 3 4. 1 3 WLM WebPMI Noda 4. 2 WLM+pop WebPMI+pop Noda+pop 4. 3 WLM+BPR WebPMI+BPR Noda+BPR 4. 4 WLM+pop+BPR WebPMI+pop+BPR Noda+pop+BPR Biased PageRank 89 0.1 0.9 0.1 5. 2 5. 4 5. 3 2
2 α ****** original 10%5%1% pop pop+bpr BPR pop+bpr 3 Biased PageRank original WLM WebPMI Noda Biased PageRank WLM WebPMI Noda 2 pop 2 BPR 3. 1 2 WLM Noda 2 pop+bpr 1 biased PageRank 6
3 Noda Noda+BPR α = 0.9 1 1 1 2 2 2 3 7 3 4 3 4 5 6 6 6-13 7-8 8 8 10 9 - - 10 13 17 0.457 0.698 4 WebPMI WebPMI+pop+BPR α = 0.9 1 9 1 2 8 3 3 4 4 4 2 2 5 5 5 6 6 10 7-7 8 5 6 9 3 11 10 1 9 0.400 0.714 3 3 3 Noda Noda+BPR α = 0.9 10 - Noda 10 Noda Noda+BPR 17 26 4 4 5 1 { } 55 { } 3 { } 136 { } 1 { } 28 { } 1 { } 190 { } WebPMI WebPMI+pop+BPR α = 0.9 10 5 5 Bollegala [6] Gabrilovich [10] 6. 6. 1 c m t a c m t a c m O m = {o 1, o 2,, o m} t a unexp(c, O m, t a) = (1 rel pop cred(o i, t a)) (10) o i O m m = 2 rel pop cred(o i, t a) 5. Noda+pop+BPR 0.5 5 5
6. 2 2 1 2 { } 2 { } 10 Wikipedia 2 Wikipedia 7. 1 2 5 25 2 242400132468000812J03993 [1] S. E. Robertson and S. Walker: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, Proc. of ACM SIGIR 1994, pp. 232 241 (1994). [2] J. M. Kleinberg: Authoritative sources in a hyperlinked environment, J. ACM, 46, pp. 604 632 (1999). [3] S. Brin and L. Page: The anatomy of a large-scale hypertextual web search engine, Proc. of WWW 2007, pp. 107 117 (1998). [4] K. Tsukuda, H. Ohshima, M. Yamamoto, H. Iwasaki and K. Tanaka: Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships, Proc. of SAC 2013, pp. 878 885 (2013). [5] G. Lu, P. Huang, L. He, C. Cu and X. Li: A new semantic similarity measuring method based on web search engines, W. Trans. on Comp., 9, 1, pp. 1 10 (2010). [6] D. Bollegala, Y. Matsuo and M. Ishizuka: A web search engine-based approach to measure semantic similarity between words, IEEE Trans. on Knowl. and Data Eng., 23, 7, pp. 977 990 (2011). [7] M. Sahami and T. D. Heilman: A web-based kernel function for measuring the similarity of short text snippets, Proc. of WWW 2006, pp. 377 386 (2006). [8] H.-H. Chen, M.-S. Lin and Y.-C. Wei: Novel association measures using web search with double checking, Proc. of ACL-44, pp. 1009 1016 (2006). [9] M. Strube and S. P. Ponzetto: Wikirelate! computing semantic relatedness using wikipedia, Proc. of AAAI 2006, pp. 1419 1424 (2006). [10] E. Gabrilovich and S. Markovitch: Computing semantic relatedness using wikipedia-based explicit semantic analysis, Proc. of IJCAI 2007, pp. 1606 1611 (2007). [11] D. Milne and I. Witten: An effective, low-cost measure of semantic relatedness obtained from wikipedia links, Proc. of WIKIAI 2008, pp. 25 30 (2008). [12] B. Liu, Y. Ma and P. S. Yu: Discovering unexpected information from your competitors web sites, Proc. of ACM SIGKDD 2001, pp. 144 153 (2001). [13] Y. Mejova, I. Bordino, M. Lalmas and A. Gionis: Searching for interestingness in wikipedia and yahoo!: answers, Proc. of WWW 2013, pp. 145 146 (2013). [14] A. Nadamoto, E. Aramaki, T. Abekawa and Y. Murakami: Content hole search in community-type content, Proc. of WWW 2009, pp. 1223 1224 (2009). [15] Y. Noda, Y. Kiyota and H. Nakagawa: Discovering serendipitous information from wikipedia by using its network structure, Proc. of ICWSM 2010, pp. 299 302 (2010). [16],,,, web, DBSJ Letters, 5, 2, pp. 69 72 (2006). [17],,,, wikipedia, p. to appear (2013). [18],, Web,., 47, 19, pp. 98 112 (2006). [19],,, Web (WebDB Forum 2013) (2013). [20] T. H. Haveliwala: Topic-sensitive pagerank, Proc. of WWW 2002, pp. 517 526 (2002).