DEIM Forum 2009 A7-1 Web 606-8501 E-mail: {nakatani,adam,ohshima,tanaka}@dl.kuis.kyoto-u.ac.jp Web Web Web Web Wikipedia Web Wikipedia 1. Web Nakamura 2007 1000 [1] (46%) (ii) 40 (36.8%) 2 Web Web (i) Web Web (ii) (i) Web Web Web Web Web 1 Easiest-first Web 1 2 2 Web (i) 2 3 4 Wikipedia (ii) 5 (i) (ii) Web 6 7 1 http://ja.wikipedia.org/wiki/ 2 http://www.tmin.ac.jp/medical/01/parkinson1.html 1 (i) Parkinson s disease
2. 9:;<>=< 02143?A@ BC@ DFEHGH@ I Web "!$#&% ' ( )+*&,.-, / )*,.-,/&576 8 #&% ' ( 1 Web Web Wikipedia Wikipedia Web LexRank [3] Web Web Wikipedia Xin [2] Web MeSH 3 Xin 3. Web 3. 1 Web Gunning-Fog Index ARI [4] Dale-Chall Readability Index [5] Gray [6] Wikipedia 1 Wikipedia 2 3 4 4 Web Web 1 Web Web Xin [2] 3 http://www.nlm.nih.gov/pubs/factsheets/mesh.html Xin
Web Web [7] [8] 3. 2 Wikipedia Wikipedia 4 Wikipedia 2008 6 Wikipedia 240 2 Wikipedia Wikipedia Wikipedia q Wikipedia Nature C q = {c 1,..., c m } Wikipedia Britannica [9] Milne [10] Agrovoc Wikipedia Wikipedia Strube [11] Wikipedia Ito [12] Wikipedia 4. 1. 2 Wikipedia C q c i D q t Koru [13] Wikipedia q t Wikipedia Mihalcea Wikify! [14] D q Wikipedia 2 t 1 D q Wikipedia t 1 t 2 t 3 D q Wikipedia Web (KLD) D q 4. Wikipedia P(t) Wikipedia t 4. 1 Wikipedia P( t) Wikipedia t P(t D q ) D q Wikipedia t P( t D q ) D q Wikipedia t Wikipedia KLD KLD(t; D q ) = P(t) log P(t D q) + P( t) log P( t D q) (1) P(t) P( t) KLD(t; D q ) > = θ KLD t q 4. 2 Wikipedia 4. 1. 1 q Wikipedia q 4 5 4 http://en.wikipedia.org/wiki/special:statistics t Wikipedia c i
2 149 0.7584 19 0.7895 74 0.8649 83 0.9157 77 0.7922 80.4 0.8184 86 0.7326 42 0.7143 69 0.7826 11 0.7273 12 0.4167 45 0.7273 28 0.7500 SQL 25 0.8400 31 0.7419 138 0.7174 60 0.8833 56.4 0.7695 22 0.5000 46 0.5870 11 0.6364 44 0.6364 20 0.6000 28.6 0.5944 Wikipedia C readability (p) = Obi(p) 1 (2) 5 Obi(p) C readability θ KLD = 0.01 2 5. 1. 2 70% 80%Web Web Wikipedia 3 C technical (p, q) = exp n t(p, q) log p (3) 3,,, 5. 5. 1 5. 1. 1 Web 3. 1 Web Sato [15] 5 13 1 1 13 Web n t (p, q) p q 5 http://kotoba.nuee.nagoya-u.ac.jp/sc/readability/obi.html
! "#$&%('&)&*,+.- 3 LexRank p p C technical 5. 2 Web 5. 2. 2 Web C(p, q) = (1 α) C readability (p) + α C technical (p, q) (5) α 0 < = α < = 1 C LexRank [3] C 6. 5. 2. 1 LexRank 6. 1 LexRank Web PageRank LexRank 4 3 Web API LexRank Web Wikipedia [3] Web Web HTML Web Web tfidf [16] Yahoo!JAPAN Web API 6 Wikipedia 2008 7 24 s i, j i j Wikipedia S 7 6. 2 PageRank LexRank(LR) 100 4 LR = ds LR + (1 d)p, where p = [ 1 n ] n 1 (4) n S S d dumping factor 6 http://developer.yahoo.co.jp/webapi/search/websearch/v1/websearch.html d = 0.85 LexRank 7 http://download.wikimedia.org/jawiki/ 4 α = 0.5 1 4
4 Web 1 (23) Cat Chat:Dr. :? http://www.tbs.co.jp/catchat/friendpark/universe/que blackhole.html 2 (46) -!goo http://oshiete1.goo.ne.jp/qa390129.html 3 (42) http://park1.wakwak.com/ỹumemaru/blackhole.html 4 (98) Yahoo! - - http://contents.kids.yahoo.co.jp/hoshizora/encyclopedia/dic blackhole.html 5 (72)? http://www.h7.dion.ne.jp/ñatsuume/visual/bh1.html 41 (2) SPACE INFORMATION CENTER : http://spaceinfo.jaxa.jp/ja/black holes.html 44 (1) - Wikipedia http://ja.wikipedia.org/wiki/ (3) BLACK HOLE http://www.h3.dion.ne.jp/ black.h/ 1 (33) Neuroinfo Japan: http://square.umin.ac.jp/neuroinf/patient/502.html 2 (84) DBS NouProblem.jp http://www.nouproblem.jp/dbs/index.html 3 (93) asahi.com : : http://www.asahi.com/health/soudan/jh030430.html 4 (3) http://www.niigata-nh.go.jp/nanbyo/pd/pdindex.htm 5 (73) [.com] http://www.peisuke.com/parkinson/top.htm 50 (2) (3)... http://www.nanbyou.or.jp/sikkan/089.htm (1) Parkinson s Disease http://www.parkinson.gr.jp/ Web Web 2 QA 2 5 7. 1 Wikipedia 2 Web Wikipedia Web 3 Wikipedia Web 1 2 Wikipedia 5 1 Web COE Web IT
1809041 NICT (B) Adam Jatowt 18700111 [1] S. Nakamura, S. Konishi, A. Jatowt, H. Ohshima, H. Kondo, T. Tezuka, S. Oyama and K. Tanaka: Trustworthiness analysis of web search results, Proceedings of the 11th ECDL (2007). [2] X. Yan, D. Song and X. Li: Concept-based document readability in domain specific information retrieval, CIKM 06: Proceedings of the 15th ACM international conference on Information and knowledge management, New York, NY, USA, ACM, pp. 540 549 (2006). [3] G. Erkan and D. R. Radev: Lexrank: Graph-based lexical centrality as salience in text summarization, Journal of Artificial Intelligence Research, 22, pp. 457 479 (2004). [4] E. A. Smith and R. J. Senter: Automated readability index, AMRL- TR-66-22 (1967). [5] E. Dale and J. Chall: Readability Revisited: The New Dale-Chall Readability Formula, Brookline Books/Lumen Editions (1995). [6] W. S. Gray and B. Leary.: What makes a book readable, Chicago University Press (1935). [7] M. Y. Ivory and M. A. Hearst: Statistical profiles of highly-rated web sites, CHI 02: Proceedings of the SIGCHI conference on Human factors in computing systems, New York, NY, USA, ACM, pp. 367 374 (2002). [8] T. Mandl: Implementation and evaluation of a quality-based search engine, HYPERTEXT 06: Proceedings of the seventeenth conference on Hypertext and hypermedia, New York, NY, USA, ACM, pp. 73 84 (2006). [9] J. Giles: Internet encyclopedia go head to head, Nature, 438, (2005). [10] D. Milne, O. Medelyan and I. H. Witten: Mining domain-specific thesauri from wikipedia: A case study, International Conference on Web Intelligence (2006). [11] M. Strube and S. P. Ponzetto: Wikirelate! computing semantic relatedness using wikipedia, Proceedings of National Conference for Artificial Intelligence (2006). [12] M. Ito, K. Nakayama, T. Hara and S. Nishio: Association thesaurus construction methods based on link co-occurrence analysis for wikipedia, CIKM 08: Proceeding of the 17th ACM conference on Information and knowledge management, New York, NY, USA, ACM, pp. 817 826 (2008). [13] D. N. Milne, I. H. Witten and D. M. Nichols: A knowledge-based search engine powered by wikipedia, Proceedings of the sixteenth ACM conference on CIKM, New York, NY, USA, ACM (2007). [14] R. Mihalcea and A. Csomai: Wikify!: linking documents to encyclopedic knowledge, Proceedings of the sixteenth ACM conference on CIKM, ACM (2007). [15] S. M. Satoshi Sato and Y. Kondoh: Automatic assessment of japanese text readability based on a textbook corpus, Proceedings of the Sixth International Language Resources and Evaluation (LREC 08) (Ed. by E. L. R. A. (ELRA)), Marrakech, Morocco (2008). [16] G. Salton and C. Buckley: Term-weighting approaches in automatic text retrieval, Inf. Process. Manage., 24, 5, pp. 513 523 (1988).