DEIM Forum 2014 A5-2 Twitter 565 0871 1 5 E-mail: {shirakawa.masumi,hara,nishio}@ist.osaka-u.ac.p 9 24 Twitter,,, Wikipedia, Explicit Semantic Analysis, 1. political leaning Twitter Cision 2013 1 90% 9 59% Lissted 2 Muck Rack 3 Twitter curation Lissted Muck Rack Togetter 4 2010 12 Arab Spring 5 Twitter [2] Twitter 9 24 Twitter Twitter 6 Twitter 2. Sáez- Trumper [1] EC Joint Research Centre Europe Media Monitor 1http://us.cision.com/thought-leadership/2013-social-ournalism/ 2http://lissted.com/ 3http://muckrack.com/ 4http://togetter.com/ 5http://www.alazeera.net/ 6http://twitter.com/who_to_follow/interests
1 (EMM) 7 [3], [4] EMM EU 50 2,200 Web EMM NewsBrief 8 10 NewsExplorer 19 NewsExplorer Wikipedia Wikipedia EMM NLP Columbia Newsblaster 9 [5] 7http://emm.newsbrief.eu/overview.html 8 Web 92014 1 Web [6] [7] [8] [9] 3. 1 1 (1) 2013 12 12
(2) 2013 12 12 9 2013 12 10 9 2013 12 12 (3) 2013 12 11 2013 12 12 4. 4. 1 CL-ESA Song [10] Bag-of-Words LDA [11] Wikipedia Explicit Semantic Analysis (ESA) [12] ESA Wikipedia TFIDF [13] Okapi BM25 [14] Wikipedia Wikipedia ESA Wikipedia Cross-Lingual Explicit Semantic Analysis (CL-ESA) [15] CL-ESA ESA Wikipedia 2 Wikipedia CL-ESA Wikipedia Wikipedia T t T L L Wikipedia e (L) ESA V (L) ESA (T ) V (L) ESA (T ) = v (L) (t k ) (1) t k T v (L) (t k ) e (L) t k e (L) BM25 S BM25(t k, e (L) ) v (L) (t k )[e (L) ] = S BM25(t k, e (L) ) (2) BM25 TFIDF CL-ESA V CLESA(T, L) V CLESA (T, L) e ( English ) V (L) ESA (T ) V CLESA (T, L)[e ( English ) ] = V (L) ESA (T )[e(l) ] (3) e ( English ) e (L) Wikipedia L V CLESA (T, L) 4. 2 dp-means [16]
one-pass dp-meansdp-means k-means dp-means dp-means [16] one-pass dp-means dp-means (1) x 1,, x n λ (2) x i z i = 1 l 1 = {x 1,, x n} K = 1 (3) x i l r r > λ z i = r < = λ z i = K + 1 K = K + 1 (4) z i l 1,, l K (5) (3), (4) (6) l 1,, l K one-pass dp-means (1) x λ l 1,, l K (2) x l r r > λ x l r < = λ l K+1 = {x} K = K + 1 (3) l 1,, l K one-pass dp-means dp-means one-pass dp-means CL-ESA L x V CLESA(x, L) l W CLESA(l) W CLESA(l) = V CLESA(x i, L i) (4) x i l L i x i x l r V CLESA(x, L) W CLESA(l) 1 (4) W CLESA (l) = V CLESA (x first, L first ) (5) x first L first x first (5) 4. 3 CL-ESA CL-ESA λ q A (1) q L λ q (2) A (3) V CLESA(q, L) (4) V CLESA(q, L) e ( English ) C (5) l k C V CLESA(q, L) W CLESA(l k ) r k r k > λ q
2 l k A (6) A 5. 4. 2 5. 1 Twitter Streaming API 10 Twitter 11 API User streams Twitter 9 24 613 CL-ESA CL-ESA Wikipedia 10http://dev.twitter.com/docs/streaming-apis 11http://twitter.com/mlournalism ESA 200 1 one-pass dp-means one-pass dp-means 0.25 CL-ESA Wikipedia 200 0.25 5. 2
1 6. 9 24 Twitter Web 12 ESA [17] ESA [18] IT IT 2012 2016 [1] D. Sáez-Trumper, C. Castillo and M. Lalmas: Social Media News Communities: Gatekeeping, Coverage, and Statement Bias, Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), pp. 1679 1684 (2013). [2], (2012). [3] M. Atkinson and E. V. der Goot: Near Real Time Information Mining in Multilingual News, Proceedings of International World Wide Web Conference (WWW), pp. 1153 1154 (2009). [4] R. Steinberger, B. Pouliquen and E. van der Goot: An Introduction to the Europe Media Monitor Family of Applications, Proceedings of SIGIR Workshop on Information Access in a Multilingual World, pp. 1 8 (2009). [5] D. K. Evans, J. L. Klavans and K. R. McKeown: Columbia Newsblaster: Multilingual News Summarization on the Web, Proceedings of Human Language Technology Conference of North American Chapter of the Association for Computational Linguistics (HLT-NAACL): Demonstration Papers, pp. 1 4 (2004). [6] B. Mathieu, R. Besançon and C. Fluhr: Multilingual Document Clusters Discovery, Proceedings of RIAO Conference, pp. 116 125 (2004). [7] S. Montalvo, R. Martínez, A. Casillas and V. Fresno: Multilingual Document Clustering: An Heuristic Approach Based on Cognate Named Entities, Proceedings of International Conference on Computational Linguistics and Annual Meeting of the Association for Computational Linguistics (COLING-ACL), pp. 1145 1152 (2006). [8] C.-P. Wei, C. C. Yang and C.-M. Lin: A Latent Semantic Indexing-based Approach to Multilingual Document Clustering, Decision Support Systems, 45, 3, pp. 606 620 (2008). [9] D. Yogatama and K. Tanaka-Ishii: Multilingual Spectral Clustering Using Document Similarity Propagation, Proceedings of Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 871 879 (2009). [10] Y. Song, H. Wang, Z. Wang, H. Li and W. Chen: Short Text Conceptualization Using a Probabilistic Knowledgebase, Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 2330 2336 (2011). [11] D. M. Blei, A. Y. Ng and M. I. Jordan: Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, pp. 993 1022 (2003). [12] E. Gabrilovich and S. Markovitch: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proceedings of International Joint Conference on Artificial Intelligence (IJCAI), pp. 1606 1611 (2007). [13] G. Salton and C. Buckley: Term-weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24, 5, pp. 513 523 (1988). [14] S. E. Robertson, S. Walker, S. Jones, M. Hancock-Beaulieu and M. Gatford: Okapi at TREC-3, Proceedings of Third Text REtrieval Conference (TREC-3) (1994). [15] P. Sorg and P. Cimiano: Cross-lingual Information Retrieval with Explicit Semantic Analysis, Working Notes for the CLEF 2008 Workshop (2008). [16] B. Kulis and M. I. Jordan: Revisiting k-means: New Algorithms via Bayesian Nonparametrics, Proceedings of International Conference on Machine Learning (ICML), pp. 513 520 (2012). [17] M. Shirakawa, K. Nakayama, T. Hara and S. Nishio: Probabilistic Semantic Similarity Measurements for Noisy Short Texts Using Wikipedia Entities, Proceedings of ACM International Conference on Information and Knowledge Management (CIKM), pp. 903 908 (2013). [18],,, Wikipedia,, 12, 2, pp. 7 12 (2013). 12http://mlournalism.com/