DEIM Forum 2 D3-6 819 39 744 66 8 E-mail: kawamoto@inf.kyushu-u.ac.jp, tawara@db.soc.i.kyoto-u.ac.jp, {asano,yoshikawa}@i.kyoto-u.ac.jp 1.,, Amazon.com The Internet Movie Database (IMDb) 1 Social spammers Crowd turfing workers Amazon.com 1 http://www.imdb.com/ Repeated improvement [1] [2 ] [6 8] Lauw [6] Wang [8] [7] Amazon.com [7]
2. 3. 4. Repeated improvement. 6. 2. Repeated improvement [6 9] Lauw [6] [7] Wang [8] Wang 1 2 3 4 (2) agreement score agreement score [7] Repeated improvement [ 12] [13 ] Noble Cook [13] Chakrabarti [14] Sun [] [2 ] 3. V R = {p 1, p 2,, p VR } V O = {q 1, q 2,, q VO } E = {(p, q) p V R, q V O} p q p q (p, q) e(p, q) p q p V R a p R q V O s q 3.1 V R V O E
1: e G = (V R, V O, E, e, {a p1 a p2,, a }, p VR {sq 1, s q2,, s }) q VO Repeated improvement 4. 1 p q e(p, q) a pq e(p, q) s q 3.1 1 4 reviewer 4 object p 1 q 1 q 4 3 1 p 1 q 1 q 4 e(p 1, q 1 ) = 3 e(p 1, q 4 ) = 1 4. Repeated Improvement Analysis Repeated improvement analysis (RIA) Repeated improvement [1] 1 2 p q a pq p p a p a pq e(p, q) s q q 4.1 p q e(p, q) a pq q s q a pq = e(p, q) s q. (1) 1 2 4.2 q s q c q if N q = 1, c q = 1 log(n σ q) otherwise q 2+1 N q q σ 2 q σ 2 q = 1 N q 1 p:(p,q) E e(p, q) p:(p,q) E e(p, q) N q 2 (2)
4.3 p a p a p = 4. 2 q:(p,q) E c q q:(p,q) E cq a pq. (3) p w α(p) µ a = w α(p) = p V R 1 1 + exp(α ap µa σ a ) a p V R, σ2 a = 2 (a p µ a ) V R p V R µ a σ 2 a α α α 1 α. 4.4 q s q s q = p:(p,q) E 4. 3 (4) w α (p) p :(p,q) E w e(p, q). () α(p ) 1 RIA 3. G 4. 2 p a p 1/N p N p p q s q c q a p s q 4. 4 Algorithm 1 Repeated Improvement Analysis Require: G = (V R, V O, E, e, {a p1 a p2,, a p VR }, {s q1, s q2,, s q VO }) Require: α /* */ 1 < = i < = V R a i = 1/N i for each j in 1 < = j < = V O do c j (2) end for repeat /* s j */ w α (a p ) (4) for each j in 1 < = j < = V O do s j () end for /* a i */ for each i in 1 < = i < = V S do a ij (1) a i (3) end for until a i s j return {a p1, a p2,, a p VR }, {s q1, s q2,, s } q VO 1: 24 12 31 3 2 1 1 61326 24 12 31 73667 a pq (1) p e(p, q) q s q a pq = distance(e(p, q), s q ) distance distance KL.. 1 Amazon.com [3] 1996 31 26 29 1 24 12 31 26
(a) (b) 2: 3. 4. 3: 31 26 31 24 12 31 1 2 2a 4 2b Social spammers Crowd turfing workers Amazon.com % 183 183 Amazon.com 24 12 31 3.. 3. 2 4 183 47 24 12 31 3.. 3. 24 12 31 3 24 12 31 26 29 4. RIA Repeated improvement [4] ONE Repeated improvement [7] MRA α.2,, 4. 2 4. 1
2 2.2.4.6.8.2.4.6.8 (a) ONE (b) MRA 2 2.2.4.6.8.2.4.6.8 (c) RIA (α =.2) (d) RIA (α = ) 2 2.2.4.6.8.2.4.6.8 (e) RIA (α = ) (f) RIA (α = ) 4: :, :, : MRA RIA α =, α
3. 4. 3. 4. (a) ONE (b) MRA 3. 4. 3. 4. (c) RIA (α =.2) (d) RIA (α = ) 3. 4. 3. 4. (e) RIA (α = ) (f) RIA (α = ) : 4c α =.2 α =.2 α =.2
α 4c, 4d, 4e, 4f α (4). 3 3 Repeated improvement a ONE 3 b e f MRA RIA α =, c α =.2 α =.2 6. Repeated improvement Repeated improvement [1] Easley, D., Kleinberg, J.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press (2) [2] Liu, J., Cao, Y., Lin, C., Huang, Y., Zhou, M.: Low-Quality Product Review Detection in Opinion Summarization. In: Proc. of the 27 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Prague, Czech Republic, Association for Computational Linguistics (27) 334 342 [3] Jindal, N., Liu, B.: Opinion spam and analysis. In: Proc. of the 28 International Conference on Web Search and Data Mining, Palo Alto, California, USA, ACM Press (February 28) 219 23 [4] Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting Product Review Spammers using Rating Behaviors. In: Proc. of the 19th ACM International Conference on Information and Knowledge Management, Toronto, ON, Canada, ACM Press (October 2) 939 948 [] Mukherjee, A., Liu, B., Wang, J., Glance, N.S., Jindal, N.: Detecting group review spam. In: In Proc. of the 2th International Conference on World Wide Web (Companion Volume). (211) 93 94 [6] Lauw, H.W., Lim, E., Wang, K.: Summarizing review scores of unequal reviewers. In: Proceedings of the Seventh SIAM International Conference on Data Mining, April 26-28, 27, Minneapolis, Minnesota, USA. (27) 39 44 [7] Tawaramoto, K., Kawamoto, J., Asano, Y., Yoshikawa, M.: A Bipartite Graph Model and Mutually Reinforcing Analysis for Review Sites. In: Proc. of the 22nd International Conference on Database and Expert Systems Applications, Toulouse, France, Springer (211) 341 348 [8] Wang, G., Xie, S., Liu, B., Yu, P.S.: Review Graph Based Online Store Review Spammer Detection. In: Proc. of the 11th IEEE International Conference on Data Mining, Vancouver, BC, Canada, IEEE Computer Society (December 211) 1242 1247 [9] Easley, D., Kleinberg, J.: Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press [] Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: In Proc. of the 21 ACM SIGMOD International Conference on Management of Data. (21) 37 46 [11] Chandola, V., Banerjee, A., Kumar, V.: Anomaly Detection: A Survey. ACM Computing Surveys 41(3) (July 29) 1 8 [12] Wang, X., Davidson, I.: Discovering Contexts and Contextual Outliers Using Random Walks in Graphs. In: Proc. of the Ninth IEEE International Conference on Data Mining, Miami, FL, USA, IEEE Computer Society (December 29) 34 39 [13] Noble, C.C., Cook, D.J.: Graph-based anomaly detection. In: Proc. of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. (23) 631 636 [14] Chakrabarti, D.: Autopart: Parameter-free graph partitioning and outlier detection. In: Proc. of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases. (24) 112 124 [] Sun, J., Qu, H., Chakrabarti, D., Faloutsos, C.: Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In: Proc. of the Fifth IEEE International Conference on Data Mining, Houston, TX, USA, IEEE Computer Society (November 2) 418 42