Twitter 1,a) 1 1 2013 8 26, 2013 9 27 Latent Dirichlet Allocation LDA Twitter LDA 1 1 LDA 1 1 1 1 Twitter-LDA Twitter-LDA Twitter-LDA Twitter 2 Twitter-LDA 1 2 Topic Tracking Model TTM Twitter Twitter A Proposal of Online Topic Model for Twitter Considering Temporal Dynamics of User Interests and Topic Trends Kentaro Sasaki 1,a) Tomohiro Yoshikawa 1 Takeshi Furuhashi 1 Received: August 26, 2013, Accepted: September 27, 2013 Abstract: Latent Dirichlet Allocation (LDA) is a topic model which has been applied to various fields. It has been also applied to user profiling or event summarization on Twitter. In the application of LDA to tweet collection, it generally treats aggregated all tweets of a user as a single document. On the other hand, Twitter-LDA which assumes a single tweet consists of a single topic has been proposed and showed that it is superior to the former way in topic semantic coherence. However, Twitter-LDA has a problem that it is not capable of online inference. In this paper, we extend Twitter-LDA in the following two points. First, we model the generation process of tweets more accurately by estimating the ratio between topic words and general words for each user. Second, we enable it to estimate temporal dynamics of user interests and topic trends in online based on Topic Tracking Model (TTM) which models consumer purchase behaviors. Keywords: topic model, Twitter, time evolution, online learning 1. 1 Graduate School of Engineering, Nagoya University, Nagoya, Aichi 464 8603, Japan a) sasaki@cmplx.cse.nagoya-u.ac.jp Twitter Twitter 140 140 c 2014 Information Processing Society of Japan 53
[6], [14], [15], [19] Latent Dirichlet Allocation LDA [4] Twitter LDA LDA [14] [19] LDA LDA 1 1 LDA 1 1 [9], [14], [19] Zhao Twitter 1 1 Twitter-LDA [21] Twitter-LDA LDA Twitter Twitter-LDA Topic Tracking Model TTM [10] Twitter-TTM Twitter 2. Twitter-LDA 2.1 Twitter-LDA 1 Twitter-LDA 1 1 LDA Twitter-LDA φ u φ u u φ u 1 θ B k θ k y y =0 θ B y =1 θ k y π Twitter-LDA y π 2 y π π u 2.2 2.1 LDA Twitter-LDA LDA 1 1 1 Twitter-LDA Fig. 1 Graphical model of Twitter-LDA. c 2014 Information Processing Society of Japan 54
1 10 Table 1 Perplexity of each model in 10 runs. LDA Twitter-LDA 50 943.5(9.8) 1,120.4(24.4) 725.9(10.0) 100 950.6(12.5) 928.2(16.1) 630.4(12.5) 150 958.1(6.2) 825.3(10.8) 580.3(9.9) 200 963.0(12.1) 742.8(10.5) 536.0(9.5) 250 973.4(9.9) 695.4(11.9) 507.4(7.7) 2 Fig. 2 Graphical model of improved-model. 2013 5 1 16,411 493,462 2013 4 23 1,000 30 *1 16,170 371,199 5,708 Collapsed [7] 500 [13] 2.2.1 D (test) ( perplexity =exp 1 ) log p(w u D (test) ) (1) N u N w u u K 50 250 50 10 1 10 () 1 Twitter-LDA *1 LDA 1 1 2.2.2 [12] z p(v z) M V (z) =(v (z) 1,...,v(z) M ) C(z; V (z) )= M m 1 m=2 l=1 log D(v(z) m,v (z) l )+1 D(v (z) l ) (2) (2) D(v) v D(v, v ) v v 1 1 1 C(k; V (k) ) (3) K k K 50 250 50 10 (3) 3 (2) M 20 3 LDA Twitter-LDA LDA K =50 LDA c 2014 Information Processing Society of Japan 55
3 10 Fig. 3 Coherence of each model averaged in 10 runs. Twitter-LDA 4 DTM Fig. 4 Graphical model of DTM. 3. 3.1 Topic Tracking Model 2.1 Topic Tracking Model TTM [10] TTM TTM Dynamic Topic Model DTM [3] Topic over Time ToT [17] DTM DTM 4 4 DTM ToT TTM 5 TTM TTM 1 t u φ t,u 1 ˆφ t 1,u α t,u 5 Fig. 5 p(φ t,u ˆφ t 1,u,α t,u ) k TTM Graphical model of TTM. φ αt,u ˆφ t 1,u,k 1 t,u,k (4) (4) φ t,u,k φ t,u k α t,u u t 1 t α t,u t k θ t,k 1 ˆθ t 1,k β t,u (5) θ t,k,v θ t,k v p(θ t,k ˆθ t 1,k,β t,k ) v θ β t,k ˆθ t 1,k,v 1 t,k,v (5) TTM 2.1 Twitter-TTM TTM φ t,u θ t,k 1 θ B c 2014 Information Processing Society of Japan 56
1 ˆΦ t 1 ˆΘ t 1 t z y D t Z t Y t 6 Fig. 6 Generation process of tweets in proposed model. 7 Twitter-TTM Fig. 7 Graphical model of Twitter-TTM. π u 1 6 7 Twitter [10] 1 3.2 Collapsed [7] EM [16] t Φ t = { ˆφ t,u } U u=1 Θ t = {ˆθ t,k } K k=1 θ t,b π t,u α t = {α t,u } U u=1 β t = {β t,k } K k=1 t p(d t,y t,z t ˆΦ t 1, ˆΘ t 1, α t, β t,λ,γ) ( Γ(2γ) ) U Γ(γ + n t,u,b )Γ(γ + n t,u,k ) = Γ(γ) 2 Γ(2γ + n t,u ) u Γ(Vλ) Γ(n t,b,v + λ) v Γ(λ) V Γ(n t,b + Vλ) Γ(β t,k ) Γ(n t,k,v + β t,k ˆθt 1,k,v ) Γ(n t,k + β t,k ) Γ(β k v t,k ˆθt 1,k,v ) Γ(α t,u ) Γ(c t,u,k + α t,u ˆφt 1,u,k ) Γ(c t,u + α t,u ) Γ(α u t,u ˆφt 1,u,k ) k (6) n t,u,b n t,u,k t u n t,b,v v t n t,k,v v t k c t,u,k t u k n t,u = n t,u,b + n t,u,k n t,b = v n t,b,v n t,k = k n t,k = k v n t,k,v c t,u = k c t,u,k (6) z p(z i = k D t,y t,z t\i, ˆΦ t 1, ˆΘ t 1, α t, β t ) c t,u,k\i + α t,u ˆφt 1,u,k c t,u\i + α t,u Γ(n t,k\i + β t,k ) Γ(n t,k,v + β t,k ˆθt 1,k,v ) Γ(n t,k + β t,k ) Γ(n t,k,v\i + β t,k ˆθt 1,k,v ) v (7) i =(t, u, s) z i t u s \i i z i = k y p(y j =0 D t,y t\j,z t,λ,γ) n t,b,v\j + λ n t,u,b\j + γ n t,b\j + Vλ n t,u\j +2γ p(y j =1 D t,y t\j,z t, ˆΘ t 1, β t,γ) n t,k,v\j + β t,k ˆθt 1,k,v n t,k\j + β t,k n t,u,k\j + γ n t,u\j +2γ (8) (9) j =(t, u, s, n) y j t u s n \j j c 2014 Information Processing Society of Japan 57
α t β t [13] α t,u α new t,u = α t,u (10) ˆφ t 1,u,k (Ψ(c t,u,k +α t,u ˆφt 1,u,k ) Ψ(α t,u ˆφt 1,u,k )) k Ψ(c t,u + α t,u ) Ψ(α t,u ) β t,k β new t,k = β t,k (11) ˆθ t 1,k,v (Ψ(n t,k,v +β t,k ˆθt 1,k,v ) Ψ(β t,k ˆθt 1,k,v )) v Ψ(n t,k + β t,k ) Ψ(β t,k ) φ t,u,k θ t,k,v MAP ˆφ t,u,k = c t,u,k + α t,u ˆφt 1,u,k c t,u + α t,u, (12) ˆθ t,k,v = n t,k,v + β t,k ˆθt 1,k,v n t,k + β t,k (13) t +1 4. Twitter Diao Twitter-LDA [6] Yan 2 biterm topic model BTM [20] Chua 2 [5] [18] Wang TM-LDA TM-LDA TM-LDA Lau Online LDA OLDA [2] Twitter [11] LDA [8] LDA 1 1 1 1 5. 2013 5 1 5 15 19,764 2013 4 23 1,000 10,244,572 30 19,729 8,257,368 63,030 LDA TTM K 100 3.2 500 [13] 1 2.2 LDA 8 5 2 5 15 x t =1 5 2 t =2 5 3... t =14 5 15 8 Fig. 8 Perplexity for each time. c 2014 Information Processing Society of Japan 58
t 1 t t =1 TTM LDA 8 TTM t =2 TTM LDA LDA 1 6. Twitter-LDA TTM Twitter Twitter-TTM [1] Twitter [1] Ahmed, A. and Xing, E.P.: Timeline: A dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream, Proc. UAI 10 (2010). [2] AlSumait, L., Barbara, D. and Domeniconi, C.: Online LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking, Proc. ICDM 08, pp.3 12, IEEE (2008). [3] Blei, D.M. and Lafferty, J.D.: Dynamic topic models, Proc. ICML 06, pp.113 120, ACM (2006). [4] Blei, D.M., Ng, A.Y. and Jordan, M.: Latent Dirichlet allocation, The Journal of Machine Learning Research, Vol.3, pp.993 1022 (2003). [5] Chua, F. and Asur, S.: Automatic Summarization of Events from Social Media, Technical report, HP Labs. [6] Diao, Q., Jiang, J., Zhu, F. and Lim, E.: Finding bursty topics from microblogs, Proc. ACL 12, pp.536 544, ACL (2012). [7] Griffiths, T.L. and Steyvers, M.: Finding scientific topics, Proc. National Academy of Sciences of the United States of America, Vol.101, No.Suppl 1, pp.5228 5235 (2004). [8] Hoffman, M., Bach, F.R. and Blei, D.M.: Online learning for latent Dirichlet allocation, Proc. NIPS 10, pp.856 864 (2010). [9] Hong, L. and Davison, B.D.: Empirical study of topic modeling in twitter, Proc. SOMA 10, pp.80 88, ACM (2010). [10] Iwata, T., Watanabe, S., Yamada, T. and Ueda, N.: Topic Tracking Model for Analyzing Consumer Purchase Behavior, Proc. IJCAI 09, Vol.9, pp.1427 1432 (2009). [11] Lau, J.H., Collier, N. and Baldwin, T.: On-line Trend Analysis with Topic Models:# twitter Trends Detection Topic Model Online, Proc. COLING 12, pp.1519 1534 (2012). [12] Mimno, D., Wallach, H.M., Talley, E., Leenders, M. and McCallum, A.: Optimizing semantic coherence in topic models, Proc. EMNLP 11, pp.262 272, ACL (2011). [13] Minka, T.: Estimating a Dirichlet distribution, Technical report, MIT (2000). [14] Pennacchiotti, M. and Popescu, A.M.: A Machine Learning Approach to Twitter User Classification, Proc. ICWSM 11, pp.281 288 (2011). [15] Sakaki, T., Okazaki, M. and Matsuo, Y.: Earthquake shakes Twitter users: Real-time event detection by social sensors, Proc. WWW 10, pp.851 860, ACM (2010). [16] Wallach, H.M.: Topic modeling: Beyond bag-of-words, Proc. ICML 06, pp.977 984, ACM (2006). [17] Wang, X. and McCallum, A.: Topics over time: A non- Markov continuous-time model of topical trends, Proc. KDD 06, pp.424 433, ACM (2006). [18] Wang, Y., Agichtein, E. and Benzi, M.: TM-LDA: Efficient Online Modeling of the Latent Topic Transitions in Social Media, Proc. KDD 12, pp.123 131, ACM (2012). [19] Weng, J., Lim, E.P., Jiang, J. and He, Q.: Twitterrank: Finding topic-sensitive influential twitterers, Proc. WSDM 10, pp.261 270, ACM (2010). [20] Yan, X., Guo, J., Lan, Y. and Cheng, X.: A Biterm Topic Model for Short Texts, Proc. WWW 13, pp.1445 1456, International World Wide Web Conferences Steering Committee (2013). [21] Zhao, W.X., Jiang, J., Weng, J., He, J., Lim, E.P., Yan, H. and Li, X.: Comparing twitter and traditional media using topic models, Proc. Advances in Information Retrieval, pp.338 349, Springer (2011). c 2014 Information Processing Society of Japan 59
2013 3 4 1997 1998 2005 COE 2006 10 IEEE 1985 2004 1996 IEEE c 2014 Information Processing Society of Japan 60