Hadoop. Mining Hot Event from Microblog with Hadoop. Vol. 35 No Journal of Chinese Computer Systems. Hadoop. .

Σχετικά έγγραφα
ER-Tree (Extended R*-Tree)

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

Quick algorithm f or computing core attribute

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks


CorV CVAC. CorV TU317. 1

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Reading Order Detection for Text Layout Excluded by Image

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

Topic Estimation for Microblogs Taking into Account the Relationships between Adjacent Tweets


Journal of Central South University (Science and Technology) May Bragg TU443 A (2011)

Kenta OKU and Fumio HATTORI


ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

Evaluation on precision of occurrence measurement based on theory of errors

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

Q L -BFGS. Method of Q through full waveform inversion based on L -BFGS algorithm. SUN Hui-qiu HAN Li-guo XU Yang-yang GAO Han ZHOU Yan ZHANG Pan

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+


1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

Quantum dot sensitized solar cells with efficiency over 12% based on tetraethyl orthosilicate additive in polysulfide electrolyte

Research on Economics and Management

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

GPU. CUDA GPU GeForce GTX 580 GPU 2.67GHz Intel Core 2 Duo CPU E7300 CUDA. Parallelizing the Number Partitioning Problem for GPUs

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Research on explaining porosity in carbonate reservoir by capture cross section method

User Behavior Analysis for a Large2scale Search Engine

Ανάκτηση Πληροφορίας. Διδάσκων: Φοίβος Μυλωνάς. Διάλεξη #03

Buried Markov Model Pairwise

Research on real-time inverse kinematics algorithms for 6R robots

( ) , ) , ; kg 1) 80 % kg. Vol. 28,No. 1 Jan.,2006 RESOURCES SCIENCE : (2006) ,2 ,,,, ; ;

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

Maude 6. Maude [1] UIUC J. Meseguer. Maude. Maude SRI SRI. Maude. AC (Associative-Commutative) Maude. Maude Meseguer OBJ LTL SPIN

2002 Journal of Software

Area Location and Recognition of Video Text Based on Depth Learning Method

The State of the Art and Difficulties in Automatic Chinese Word Segmentation

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Retrieval of Seismic Data Recorded on Open-reel-type Magnetic Tapes (MT) by Using Existing Devices

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

Anomaly Detection with Neighborhood Preservation Principle

2 ~ 8 Hz Hz. Blondet 1 Trombetti 2-4 Symans 5. = - M p. M p. s 2 x p. s 2 x t x t. + C p. sx p. + K p. x p. C p. s 2. x tp x t.

High order interpolation function for surface contact problem

ZnO-Bi 2 O 3 Bi 2 O 3

A multipath QoS routing algorithm based on Ant Net

Ημερίδα διάχυσης αποτελεσμάτων έργου Ιωάννινα, 14/10/2015

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

FENXI HUAXUE Chinese Journal of Analytical Chemistry. Savitzky-Golay. n = SG SG. Savitzky-Golay mmol /L 5700.

Gain self-tuning of PI controller and parameter optimum for PMSM drives

MUL TIL EVEL2USER2ORIENTED AGRICUL TURAL INFORMATION CLASSIFICATION

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

J. of Math. (PRC) 6 n (nt ) + n V = 0, (1.1) n t + div. div(n T ) = n τ (T L(x) T ), (1.2) n)xx (nt ) x + nv x = J 0, (1.4) n. 6 n

Automatic extraction of bibliography with machine learning

IPSJ SIG Technical Report Vol.2014-CE-127 No /12/6 CS Activity 1,a) CS Computer Science Activity Activity Actvity Activity Dining Eight-He

Η Διαδραστική Τηλεδιάσκεψη στο Σύγχρονο Σχολείο: Πλαίσιο Διδακτικού Σχεδιασμού

Approximation Expressions for the Temperature Integral

Big Data/Business Intelligence

Text Mining using Linguistic Information

,,, (, ) , ;,,, ; -

Secure Cyberspace: New Defense Capabilities

Japanese Fuzzy String Matching in Cooking Recipes

ΤΕΧΝΟΛΟΓΙΚΟ ΕΚΠΑΙΔΕΥΤΙΚΟ ΙΔΡΥΜΑ ΚΡΗΤΗΣ. Σχολή Τεχνολογικών Εφαρμογών Τμήμα Εφαρμοσμένης Πληροφορικής & Πολυμέσων

Supporting information. An unusual bifunctional Tb-MOF for highly sensing of Ba 2+ ions and remarkable selectivities of CO 2 /N 2 and CO 2 /CH 4

Analysis of energy consumption of telecommunications network and application of energy-saving techniques

Rapid Raman spectra identification and determination of levofloxacin hydrochloride injection *

Rapid determination of soluble reactive silicate in seawater by flow injection analysis with spectrophotometric detection and its application

The optimization of EV powertrain s efficiency control strategy under dynamic operation condition

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Yahoo 2. SNS Social Networking Service [3,5,12] Copyright c by ORSJ. Unauthorized reproduction of this article is prohibited.

Estimation of stability region for a class of switched linear systems with multiple equilibrium points

Motion analysis and simulation of a stratospheric airship

Online Social Networks: Posts that can save lives. Sotiria Giannitsari April 2016

FSRC Greece. Τάκης Καραγιαννόπουλος. Feng Shui Research Center Greece

Automatic Domain2Specific Term Extraction and Its Application in Text Cla ssification

, Litrrow. Maxwell. Helmholtz Fredholm, . 40 Maystre [4 ], Goray [5 ], Kleemann [6 ] PACC: 4210, 4110H

Antimicrobial Ability of Limonene, a Natural and Active Monoterpene

LUO, Hong2Qun LIU, Shao2Pu Ξ LI, Nian2Bing

Application of a novel immune network learn ing algorithm to fault diagnosis

3: A convolution-pooling layer in PS-CNN 1: Partially Shared Deep Neural Network 2.2 Partially Shared Convolutional Neural Network 2: A hidden layer o

Heterogeneous Wireless Sensor Networks for Continuously Tracking Mobile Targets

Research on model of early2warning of enterprise crisis based on entropy

SUPPLEMENTAL INFORMATION. Fully Automated Total Metals and Chromium Speciation Single Platform Introduction System for ICP-MS

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

ΓΕΩΜΕΣΡΙΚΗ ΣΕΚΜΗΡΙΩΗ ΣΟΤ ΙΕΡΟΤ ΝΑΟΤ ΣΟΤ ΣΙΜΙΟΤ ΣΑΤΡΟΤ ΣΟ ΠΕΛΕΝΔΡΙ ΣΗ ΚΤΠΡΟΤ ΜΕ ΕΦΑΡΜΟΓΗ ΑΤΣΟΜΑΣΟΠΟΙΗΜΕΝΟΤ ΤΣΗΜΑΣΟ ΨΗΦΙΑΚΗ ΦΩΣΟΓΡΑΜΜΕΣΡΙΑ

Research on vehicle routing problem with stochastic demand and PSO2DP algorithm with Inver2over operator

Research of Han Character Internal Codes Recognition Algorithm in the Multi2lingual Environment

Correction of chromatic aberration for human eyes with diffractive-refractive hybrid elements

Control Theory & Applications PID (, )

Πρόσκληση. DOSSIER-Cloud DevOpS-based Software engineering for the cloud

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

Transcript:

Journal of Chinese Computer Systems 2014 4 4 Vol 35 No 4 2014 Hadoop 1 1 2 3 1 1 361000 2 518000 3 200000 E-mailchenlin@ xmu edu cn Hadoop Twitter Hadoop TP18 A Mining Hot Event from Microblog with Hadoop 1000-1220201404-0797-05 XIE Si-fa 1 LIN Chen 1 2 SU Xuan 3 JIANG Yi 1 1 School of Information Science and TechnologyXiamen UniversityXiamen 361005 China 2 Shenzhen Research Institute of Xiamen UniversityShenzhen 518000 China 3 Channal trans Network ShanghaiCo Shanghai 20000 China AbstractAs a newly emerging social-networking servicemicroblog has a strong immediate communication function and can release hot issues of society rapidly by various methods However the huge mass of data releasing in a short time leads to the fragmentation of information to some extent Moreover the quick updating of information results in the difficulty of retrieving essential issues In this paperwe propose a distributed algorithm of mining hot spots from Microblog data based on Hadoopwhich is superior in big data miningand detect hot issues according to the extracted spots for users' searching convenience Furthermorewe put forward the detecting algorithm with a linear time complexitydetecting the time period of the burst of the hot issues The experiments on Twitter and Sina Weibo show that our algorithm can extract hot issues from microblog effectively Key wordsmicrobloghadoopdistributedhot event 1 6 TDT 7 8 9 Twitter 10 Twitter 11 12-15 " " " " K-Means topic detection and tracking TDT 1-3 2 TDT Allan 4 2 1 5 MT C Agarwal T MT T C 2013-01-25 2013-03-02 61102136 61001013 2011J05158 JCYJ20120618155655087 1989 1982 Web 1982 1960

798 2014 WS W Fs W BS Map WS ti Fs = f1 f2 fn W fi fi value ti W key w j j 16 fj Reduce W Bi fi - W μ - 2σ μ σ MapReduce BS W Bs W Bs = BL b1 b2 bn bi 2 BS 2 2 WS BS TL W BL Map 1 for i = 1 to TL Length do 2 for j = i + 1 to i + w do MT 3 return key j TL j TL i WS 4 end for BS 17 WKSC 5 end for Reduce 6 InitBL 7 μ w value i 1 2 4 Fig 1 Flow char of hot event detecting WKSC 17 WKSC Haar 2 3 WS BS MT T C MT Map MT C W TL 0 1 WS MT WS WL MT list L L TL Map T 4 1 WL IKAnalyzerMT c 2 For i = 1 to WL Length do 3 InitTL 4 j GetIndexMT t 5 TL j 1 st et Ls Rs 6 return WL i TL 7 end for 3 Reduce BL 8 InitTL L 9 for i = 1 to the count of value do 1 InitL/ / L 10 TL + = valuei 2 for i = 1 to BL Length 11 returnkey TL 3 Cs' Cs 12 end for 4 Cs Cs + BL i T jtl j= 1 5 If BL i< 0 W key TL value Reduce Map 6 Continue 7 Temp i Cs' Cs key value svalue 8 whiletruedo key svalue WS 9 Merge null WS Map-Reduce WS 10 for I = LL Lengthto L1do i = 1 8 σ 2 w value i - μ 2 i = 1 9 BL j TL j-μ-2σ 10 returnkey BL Cs

4 Hadoop 799 11 if I Ls < Temp Ls Hadoop 12 Merge I / / < Cs' Hadoop 1 13 break 14 end if 15 end for I /O 16 ifmerge = = null Merge = null&&merge Rs > Temp Rs 1 Hadoop 17 L AddTemp 18 break 19 Temp st = Merge st Hadoop 20 Temp Ls = Merge Ls 21 DeleteMerge Twitter 22 end if Hadoop 23 end for while 24 end if 3 2 2011 1 23 2 8 Twitter 2 2G 1 416s 1322s 2009 8 2012 5 3 3G Hadoop 8 384s 1069s Hadoop 1 0 1 8 CPU 4 8 InterRCoreTMi7 3900s 9700s 1T 64G 2-4 32G Table 1 6 Twitter 1 Time performance of different processing nodes Twittter 4 405s 1199s 5 2 Fig 2 Japans nuclear leak Fig 3 3 Shanghai's World Expo 2-6 2 WKSC rally 3

800 2014 2 4 Fig 4 Bin Laden is shoot dead 5 Fig 5 Egypt riot 2 Table 2 Fig 6 6 Korea's world athletics championship 2011 Burst time of hot event 2010 7 2011 3 2010 3 2010 8 1 2010 12 2011 3 2011 5 2 2011 5 3 1 28 4 2 3 1 29 5 2 6 2011 1 2011 5 2010 1 2010 7 2010 9 2011 2 2011 5 1 28 2 3 1 29 2 6 2 7 5 2010 1 4 Hadoop Twitter Twitter 3 Hadoop bigram trigram

4 Hadoop 801 References 1Li Hong Wei Jin-feng Netnews bursty hot topic detection based on butsty featurec Proceedings of International Confernece on E- Business and E-GovernmentWashington DC USAIEEE 2010 1437-1440 2Holz F Teresniak S Towards automatic detection and tracking of topic changem Computational Linguistic and Intelligent Text Berlin GermanySpringer-Verlag 2010327-339 3Jing Qiu Liao Le-jian Dong Xiu-jie Topic detcetion and tracking for Chinese news web pagesc Proceedings of Seventh Internation Conference on Advanced Language Processing and Web Information Technology Washington DC USAIEEE Computer Society 2008114-120 4Allan J Papka R Lavrenko V On-line new event detection and trackingc Sigir 98 Proceedings of 21th ACM SIGIR International Conference on Research and Development in Information Retrieval New YorkACM 199837-45 5Wu Yong-huiWang Xiao-long Ding Yu-xin et al Adaptive online web topic detection method for web news recommendation system J Acta Electronica Sinica 2010 38112620-2624 6Manoj K Agarwal Krithi RamamrithamManish Bhide Real time discovery of dense clusters in highly dynamic graphsidentifying real world events in highly dynamic environmentsc Proceedings of the VLDB EndowmentVery Large Data Base Endowment Inc VLDB 2012 510980-991 7Lin Chen Lin Chun Li Jing-xuan et al Generating event storyline from microblogsc Proceedings of the 21st ACM Conference on Information and Knowledge Management CIKM 2012175-184 8Sasa PetrovicMiles OsborneVictor Lavrenko Streaming first story detection with application to twitterc The 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics HLT-NAACL 2010181-189 9Efron M Information search and retrieval in microblogsj Journal of the American Society for Information Science and Technology June 2011 626 996-1008 10Mathioudakis M Koudas N Twittermonitortrend detection over the twitter streamc Proceedings of the 2010 International Conference on Management of Data SIGMOD 2010 New York ACM 20101155-1158 11Takamura H Yokono H Okumura M Summarizing a document streamm Advances in Information Retrieval Springer Berlin Heidelberg 2011177-188 12Sakaki T Okazaki M Matsuo Y Earthquake shakes twitter users real-time event detection by social sensorsc Proceedings of the 19th International Conference on World Wide WebWWW 2010 2010851-860 13Shamma D A Kennedy L Churchill E F Peaks and persistence modeling the shape of microblog conversationsc Proceedings of the ACM 2011 Conference on Computer Supported Cooperative Work CSCW '112011355-358 14Li Jin Zhang HuaWu Hao-xiong et al BTopicMinerdomainspecific topic mining system for Chinese microblog J Journal of Computer Applications 2012 328 2346-2349 15Weng Jian-shu Bu-Sung Lee Event detection in twitterc In Proceedings of the Fifth Annual Conference on Weblogs and Social Media ICWSM 20112011401-408 16Yao Jun-jie Cui Bin Huang Yu-xin et al Bursty event detection from collaborative tagsc World Wide Web2012 2012 15 171-195 17Han Zhong-ming Chen Ni Le Jia-jin et al An efficient and effective clustering algorithm for time series of hot topicj Chinese Journal of Computers 2012 35112337-2347 5 J 2010 38 11 2620-2624 14 BTopicMinerJ 2012 32 8 2346-2349 17 J 2012 35112337-2347