DEIM Forum 2016 B5-2, E-mail: 305 8573 1-1-1 305 8573 1-1-1 305 8573 1-1-1 sei0024@kde.cs.tsukuba.ac.jp, yuto@kde.cs.tsukuba.ac.jp, kitagawa@cs.tsukuba.ac.jp Twitter Twitter 1. Twitter Twitter Twitter Twitter Kwak [1] Twitter 85% Petrovic [2] 3 Twitter 1 Twitter 2 Twitter 1 1 [3] 3
1: [4] 3 1 2 3 (2) 54.8% 2. 2. 1 Twitter Twitter Twitter [5] [6] [7] [8]. [9] [10] [11] [12]. 2. 2 Twitter Twitter Kwak [1] Twitter 1 Twitter 85% Petrovic [2] Twitter Twitter Osborne Dredze [13] Facebook Google Plus Twitter Twitter Facebook Google Plus 2. 3 Shi [14] Shi Guo [15] Guo Stajner [16] Stajner Phuvipadawat [17] Twitter Phuvipadawat 3.
m s w m s M S m w n(m, w) s w n(s, w) 1. w m M n(m, w) s S n(s, w) 1: m w s w : : s m m S m S S 4. [4] sim(s, m) = idf 2 n(s, w) (w) (1) s w s m idf [4] [4] 2 3 (a) (b) (c) 4. 1 4. 2 4. 3 4. 1 4. 2 TF IDF 4. 3 2 (a) (b) 4. 4
2: 5. 5. 1 5. 2 5. 1 ID [3] 1 ps(s, w) = idf 2 (w) n(s, w) s Document at a time (DAAT) [18] DAAT DAAT 1 ID 2 ID 3 DAAT 3 s 1,..., s 5 w 1,..., w 5 m DAAT ID Step1 ID s 1 s 1 sim(s 1, m) = 2 = 5 sim(s 1, m) s 1 m Step2 ID s 2 s 2 sim(s 2, m) = 4 = 12 sim(s 2, m) s 2 m Step3 ID s 3 s 3 10 8 s 3 m s 4 s 5 s 4 s 5 s 3
3: DAAT s 1,..., s 5 w 1,..., w 5 m DAAT ID s 1 s 5 m s 3 5. 2 DAAT [19] w UB(w) UB(w) = max s L w ps(s, w) L w w UB(s, m) = w s m UB(w) UB UB ID 1 2 3 (3) (3) Algorithm1 Algorithm1 m Θ = [θ s1, θ s2,..., θ sn ] S L w.cur L w L w.curp S L w m L w1, L w2,..., L wn 1 3 5 9 10 for 13 14 15 17 19 24 25 26 4 s 1,..., s 5 w 1,..., w 5 m Step1 s 3 UB s 3 (s 1 ) ID s 3 = 8 Step2 s 3 UB s 3
(s 3) s 3 10 8 Step3 s 3 Algorithm 1: : tweet m, Θ = [θ s1, θ s2,..., θ sn ] : S 1 L w1, L w2,..., L wn m 2 for w i m do 3 L wi 4 while true do 5 6 p 7 UB 0 8 for i [1, 2,..., m ] do 9 UB UB + max s Lwi ps(s, w i ) 10 if θ Lwi.cur < = UB then 11 p = i 12 Break 13 if p = then 14 return 15 if L w0.cur = L wp.cur then 16 for w i [w 1, w 2,..., w p 1 ] do 17 L wi L wp.cur 18 else 19 sim 0 20 i 0 21 while L wi.cur = L p.cur do 22 sim sim + L wi.curp S 23 L wi 1 24 w i w i+1 25 if sim > = θ s then 26 s S 6. 5. 1 Intel R Core TM i7-2600 CPU @ 3.40GHz 32GB. 2 NEWS 2014/6/9 1037 Yahoo!Japan 1 1 http://news.yahoo.co.jp/flash TWEET 2014/6/8 2014/6/11 2,575,198 Streaming API Sample 2 DAAT/VAT [4] Proposed 5. NEWS 100, 200,..., 1000 10 TWEET 100,000 100,000 1 p = 0.004 h = 10 δ = 0.1 1 [4] 5 5 1 x y 1 Proposed DAAT/VAT Proposed DAAT/VAT 54.8% DAAT/VAT 7.
4: s 1,..., s 5 w 1,..., w 5 m s 3 m s 3 s 3 0.18 0.16 0.14 0.12 0.10 0.08 0.06 0.04 0.02 0.00 100 200 300 400 500 600 700 800 900 1000 5: 1 x y 1 Proposed DAAT/VAT Proposed DAAT/VAT 54.8% 5 1 54.8% JSPS B 26280037 [1] H. Kwak, C. Lee, H. Park, and S. B. Moon, What is twitter, a social network or a news media?, in Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp.591 600, Raleigh, North Carolina, USA, April 26-30, 2010. [2] S. Petrovic, M. Osborne, R. McCreadie, C. Macdonald, I. Ounis, and L. Shrimpton, Can twitter replace newswire for breaking news?, in Proceedings of the Seventh International Conference on Weblogs and Social Media, ICWSM 2013, Cambridge, Massachusetts, USA, July 8-11, 2013. [3] A. Shraer, M. Gurevich, M. Fontoura, and V. Josifovski, Top-k publish-subscribe for social annotation of news, PVLDB, vol. 6, no. 6, pp. 385 396, 2013. [4] S. Onishi, Y. Yamaguchi, and H. Kitagawa, Real-time relevance matching of news and tweets, in On the Move to Meaningful Internet Systems: OTM 2015 Conferences - Confederated International Conferences: CoopIS, ODBASE, and C&TC2015, pp.109 126, Rhodes, Greece, October 26-30, 2015. [5] A. Pak and P. Paroubek, Twitter as a corpus for sentiment analysis and opinion mining, in Proceedings of the International Conference on Language Resources and Evaluation, LREC 2010, Valletta, Malta, May 17-23, 2010. [6] A. Agarwal, B. Xie, I. Vovsha, O. Rambow, and R. Passonneau, Sentiment analysis of twitter data, in Proceedings of the Workshop on Languages in Social Media, LSM 2011, pp.30 38, Portland, Oregon, 2011. [7] Z. Luo, M. Osborne, and T. Wang, Opinion retrieval in twitter, in Proceedings of the Sixth International Conference on Weblogs and Social Media, Dublin, Ireland, June 4-7, 2012. [8] G. Amati, M. Bianchi, and G. Marcone, Sentiment estimation on twitter, in Proceedings of the 5th Italian Information Retrieval Workshop, pp.39 50, Roma, Italy, January 20-21, 2014. [9] B. Sharifi, M. Hutton, and J. K. Kalita, Experiments in microblog summarization, in Proceedings of the 2010 IEEE Second International Conference on Social Computing, SocialCom / IEEE International Conference on Privacy, Security, Risk and Trust, PASSAT 2010, pp.49 56,
Minneapolis, Minnesota, USA, August 20-22, 2010. [10] D. Chakrabarti and K. Punera, Event summarization using tweets, in Proceedings of the Fifth International Conference on Weblogs and Social Media, Barcelona, Catalonia, Spain, July 17-21, 2011. [11] J. Nichols, J. Mahmud, and C. Drews, Summarizing sporting events using twitter, in 17th International Conference on Intelligent User Interfaces, IUI 2012, pp.189 198, Lisbon, Portugal, February 14-17, 2012. [12] S. V. Canneyt, M. Feys, S. Schockaert, T. Demeester, C. Develder, and B. Dhoedt, Detecting newsworthy topics in twitter, in Proceedings of the SNOW 2014 Data Challenge co-located with 23rd International World Wide Web Conference (WWW 2014), pp.25 32, Seoul, Korea, April 8, 2014. [13] M. Osborne and M. Dredze, Facebook, twitter and google plus for breaking news: Is there a winner?, in Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, June 1-4, 2014. [14] B. Shi, G. Ifrim, and N. Hurley, Be in the know: Connecting news articles to relevant twitter conversations, CoRR, vol. abs/1405.3117, 2014. [15] W. Guo, H. Li, H. Ji, and M. T. Diab, Linking tweets to news: A framework to enrich short text data in social media, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, pp.239 249, Sofia, Bulgaria, Volume1: Long Papers, August 4-9, 2013. [16] T. Stajner, B. Thomee, A. Popescu, M. Pennacchiotti, and A. Jaimes, Automatic selection of social media responses to news, in The 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, pp.50 58, Chicago, IL, USA, August 11-14, 2013. [17] S. Phuvipadawat and T. Murata, Breaking news detection and tracking in twitter, in Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology - Workshops, pp.120 123, Toronto, Canada, August 31 - September 3, 2010. [18] H. Turtle and J. Flood, Query evaluation: Strategies and optimizations, Inf. Process. Manage., vol. 31, pp. 831 850, Nov. 1995. [19] A. Z. Broder, D. Carmel, M. Herscovici, A. Soffer, and J. Zien, Efficient query evaluation using a two-level retrieval process, in Proceedings of the Twelfth International Conference on Information and Knowledge Management, CIKM 2003, pp.426 434, New Orleans, LA, USA, 2003.