Shortness Ambiguity TEAM Ungrammaticality

Σχετικά έγγραφα

Web DEIM Forum 2009 A7-1. Web. Web. Web. Web. 4 Wikipedia. Wikipedia. Web.


Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

Topic Structure Mining based on Wikipedia and Web Search



Web. Web p OutDegree(p) log 7 1/OutDegree(p) A New Difinition of Subjective Distance between Web Pages

Kenta OKU and Fumio HATTORI

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

{takasu, Conditional Random Field

Ημερίδα διάχυσης αποτελεσμάτων έργου Ιωάννινα, 14/10/2015

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Indexing Methods for Encrypted Vector Databases

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ

User Behavior Analysis for a Large2scale Search Engine

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

Re-Pair n. Re-Pair. Re-Pair. Re-Pair. Re-Pair. (Re-Merge) Re-Merge. Sekine [4, 5, 8] (highly repetitive text) [2] Re-Pair. Blocked-Repair-VF [7]

DEIM Forum 2016 B5-2. Twitter. Twitter. Twitter.



Optimization, PSO) DE [1, 2, 3, 4] PSO [5, 6, 7, 8, 9, 10, 11] (P)

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)


Automatic extraction of bibliography with machine learning

Quick algorithm f or computing core attribute

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕΣ» OSWINDS RESEARCH GROUP

Anomaly Detection with Neighborhood Preservation Principle

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Buried Markov Model Pairwise

Topic Estimation for Microblogs Taking into Account the Relationships between Adjacent Tweets

ER-Tree (Extended R*-Tree)

Nov Journal of Zhengzhou University Engineering Science Vol. 36 No FCM. A doi /j. issn

[15], [16], [17] [6] [2] [5] Jiang [6] 2.1 [6], [10] Score(x, y) y ( 1) ( 1 ) b e ( 1 ) b e. O(n 2 ) Jiang [6] (word lattice reranking)

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

(Υπογραϕή) (Υπογραϕή) (Υπογραϕή)

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

The Algorithm to Extract Characteristic Chord Progression Extended the Sequential Pattern Mining

An Automatic Modulation Classifier using a Frequency Discriminator for Intelligent Software Defined Radio

Online Social Networks: Posts that can save lives. Sotiria Giannitsari April 2016

2. N-gram IDF. DEIM Forum 2016 A1-1. N-gram IDF IDF. 5 N-gram. N-gram. N-gram. N-gram IDF.

Μοντέλα πρόβλεψης διάδοσης πληροφορίας στα κοινωνικά μέσα

From Secure e-computing to Trusted u-computing. Dimitris Gritzalis

ΕΘΝΙΚΟ ΜΕΤΣΟΒΙΟ ΠΟΛΥΤΕΧΝΕΙΟ

Area Location and Recognition of Video Text Based on Depth Learning Method

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Εφαρμογή Υπολογιστικών Τεχνικών στην Γεωργία

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Στοιχεία εισηγητή Ημερομηνία: 10/10/2017

Online Social Networks: Posts that can save lives. Dimitris Gritzalis, Sotiria Giannitsari, Dimitris Tsagkarakis, Despina Mentzelioti April 2016

Research on model of early2warning of enterprise crisis based on entropy

2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

ΠΑΝΕΠΙΣΤΗΜΙΟ ΜΑΚΕΔΟΝΙΑΣ

SUPPLEMENTAL INFORMATION. Fully Automated Total Metals and Chromium Speciation Single Platform Introduction System for ICP-MS

[5] F 16.1% MFCC NMF D-CASE 17 [5] NMF NMF 3. [5] 1 NMF Deep Neural Network(DNN) FUSION 3.1 NMF NMF [12] S W H 1 Fig. 1 Our aoustic event detect

CONFIOUS: The Conference Nous Σύστημα Διαχείρισης Επιστημονικών & Ακαδημαϊκών Συνεδρίων. (

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Study of urban housing development projects: The general planning of Alexandria City

2002 Journal of Software

Towards a more Secure Cyberspace

(Statistical Machine Translation: SMT[1]) [2]

Text Mining using Linguistic Information

Studies on the Binding Mechanism of Several Antibiotics and Human Serum Albumin

Χρήση οντολογιών στη χαρτογράφηση γνώσης: Μελέτη περίπτωσης σε μία ακαδημαϊκή βιβλιοθήκη

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

Secure Cyberspace: New Defense Capabilities

Πληροφοριακά Συστήματα

An Efficient Calculation of Set Expansion using Zero-Suppressed Binary Decision Diagrams

ΔΑΒΒΕΤΑΣ ΑΘΑΝΑΣΙΟΣ ΤΕΙ ΑΘΗΝΑΣ ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ Χρήση του Twitter ως μέσο ενημέρωσης

JOURNAL OF APPLIED SCIENCES Electronics and Information Engineering. Cyclic MUSIC DOA TN (2012)

Motion analysis and simulation of a stratospheric airship

ELIXIR-GR / BiP! Finder

A research on the influence of dummy activity on float in an AOA network and its amendments

Determination of Topic Description Terms in Topic Model

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb

An Advanced Manipulation for Space Redundant Macro-Micro Manipulator System

Κατανεμημένα Συστήματα. Javascript LCR example

Fourier transform, STFT 5. Continuous wavelet transform, CWT STFT STFT STFT STFT [1] CWT CWT CWT STFT [2 5] CWT STFT STFT CWT CWT. Griffin [8] CWT CWT

Faruqui [7] WordNet [15] FrameNet [2] PPDB [8]

, Evaluation of a library against injection attacks

CIFOR Japan CIFOR ,**0 -,**0 1 CIFOR

Ανάκτηση Πληροφορίας. Διδάσκων: Φοίβος Μυλωνάς. Διάλεξη #03

HTML. Evaluating Effects of Similarities of HTML Structures in Splog Detection

DECO DECoration Ontology

ΒΙΟΓΡΑΦΙΚΟ ΣΗΜΕΙΩΜΑ ΠΡΟΣΩΠΙΚΑ ΣΤΟΙΧΕΙΑ ΣΠΟΥΔΕΣ

n 1 n 3 choice node (shelf) choice node (rough group) choice node (representative candidate)

1530 ( ) 2014,54(12),, E (, 1, X ) [4],,, α, T α, β,, T β, c, P(T β 1 T α,α, β,c) 1 1,,X X F, X E F X E X F X F E X E 1 [1-2] , 2 : X X 1 X 2 ;

substructure similarity search using features in graph databases

CorV CVAC. CorV TU317. 1

Ηλεκτρονικές Συναλλαγές & Σημασιολογικός Ιστός

Προσομοίωση BP με το Bizagi Modeler

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems

The Application of Five Ne w Technologies in Intelligence Analysis

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

Transcript:

DEIM Forum 2017 D8-4 Twitter 305 8573 1 1 1 305 8573 1 1 1 E-mail: s.nagaki@kde.cs.tsukuba.ac.jp, kitagawa@cs.tsukuba.ac.jp Twitter Twitter 1 Twitter 1. Twitter Wikipedia 2 Twitter 1 1 Twitter [1] Twitter Shortness 140 Ambiguity Ungrammaticality 1 1 https://twitter.com/ 1: TAGME+. Twitter Twitter TEAM (Temporal Entity Association Measure) TEAM, TEAM TAGME+ TAGME+ 1

2:. (1) (2) (3) 3. 1 3 Twitter TEAM TEAM Twitter TEAM TEAM AUC (Area Under the Curve) 60% 2. (1) (2) (3) 3 2 2. 1 1 [2] 2. 2 2. 1 2. 3 2. 2

3. TAGME Twitter TAGME [3] TAGME Wikipedia 1 Wikipedia TAGME 2010 Google Faculty Award 2 3. 1 1 2 Twitter Twitter. TAGME 2. (1) (2) (3) 3. 3. 1 Wikipedia 2 k 1, k 2 k 1 k 2 Wikipedia lp(k) = link(k) freq(k), (1) 2 https://tagme.d4science.org/tagme/tagme help.html 3 283 2016 11 8 Google Scholar k link(k) k Wikipedia freq(k) k Wikipedia k 1 k 2 lp(k 1) < lp(k 2) k 2 lp(k 1) > = lp(k 2) k 1 k 2 3. 2 3. 1 Wikipedia k K Wikipedia P ages(k) TAGME, e k P ages(k) Wikipedia Link-based Measure WLM [4] WLM Wikipedia TAGME [2] WLM Wikipedia a, b W LM(a, b) = 1 log max ( A, B ) log A B. (2) log All log min ( A, B ) A, B a, b All Wikipedia WLM k e k rel k (e k ) = p k P ages(k ) W LM(pk, ek) P r(p k k ), P ages(k ) k K\{k} P r(e k k) k e k 3 e k p k TAGME rel k (e k ) ϵ% P r(e k k) e k k 3. 3 (3) 3. 2 k, e k coherence(k, e k ) = 1 S 1 e k S\{e k} W LM(e k, p k ), (4) S 1 2 (lp(k) + coherence(k, e k)) > ρ, (5)

4. 4. 1 T TEAM (Temporal Entity Association Measure) 4. 2 TEAM TAGME TAGME+ 4. 1 TEAM Twitter TEAM (Temporal Entity Association Measure) TEAM. Temporal Entity Association Measure W a, b A, B W All p(a, b) W A B A B W p(a), p(b) W A, B A W, B W a, b W Temporal Entity Association Measure TEAM-jaccard TEAM-dice TEAM-npmi TEAM-gsd T EAM jac(a, b, W ) = T EAM dice (a, b, W ) = T EAM npmi (a, b, W ) = A B A B. (6) 2 A B A + B. (7) ln p(a,b) p(a)p(b) ln p(a, b). (8) log max ( A, B ) log A B T EAM gsd (a, b, W ) = 1. log All log min ( A, B ) (9) 9 0 T EAM gsd (a, b, W ) = 0 TEAM-jaccard 2 Jaccard 2 X, Y Jaccard Jac(X, Y ) = X Y X Y. (10) 10 0 < = Jac(X, Y ) < = 1 TEAMjacard 1 a, b W TEAM-dice 2 Dice 2 x, y Dice Dice(x, y) = 2 X Y X + Y. (11) 11 0 < = Dice(x, y) < = 1 TEAMjaccard TEAM-dice 1 a, b W Jaccard TEAM-pmi pointwise mutual information (PMI). PMI 2 X, Y x, y P MI(X = x, Y = y) = ln P (X = x, Y = y) P (X = x)p (Y = y). (12) PMI < P MI(X = x, Y = y) < TEAM-pmi PMI Normalized PMI [5] Normalized PMI NP MI(X = x, Y = y) = ln P (X=x,Y =y) P (X=x)P (Y =y) ln P (X = x, Y = y). (13) Jaccard Dice PMI [6] [7] [8] TEAM-gsd Google Similarity Distance (GSD) [9] GSD 2 Google 2 x, y GSD GSD(x, y) = 1 log max ( X, Y ) log X Y. (14) log All log min ( X, Y ) X, Y x, y Google All Google GSD < GSD(x, y) < = 1 14 0 T EAM gsd = 0 T EAM gsd 0 < = T EAM gsd < = 1. TEAM 5. 4. 2 TAGME+ 3. TAGME WLM TEAM TAGME TAGME+ TEAM WLM τ w τ w W TEAM WLM

Algorithm 1: TAGME+ Input: T L, w, D, ρ, λ, ϵ Output: EL 1 for tweet T from T S do 2 i id(t ), τ time(t ), t text(t ) 3 EW shavew indow(τ, w) 4 K extractkeywords(t ) 5 for k from K do 6 for e k from Candidates(k) do 7 rel ek calcrel(k, e k, K, EW, λ) 8 Rel ek Rel ek {(e k, rel ek )} 9 end 10 E k toprel(rel ek, ϵ) 11 e k arg max ek E k P r(e k k) 12 S.enqueue((k, e k )) 13 end 14 for (k, e k ) from S do 15 coh k,ek calccoherence(k, e k ) 16 score 1 2 (kp(k) + coh(k, e k)) 17 if score > ρ then 18 EL EL {(i, k, e k )} 19 EW updatew indow(i, τ, k, e k ) 20 end 21 end 22 end 23 return EL rel τ (a, b) = (1 λ) W LM(a, b)+λ T EAM(a, b, W ). (15) λ, WLM TEAM. TAGME WLM 3 4 WLM 15 TAGME+ 1 TAGME+ T S w ρ, λ, ϵ ID 3 TAGME+ 3 EW (1) ID, (2) (3)(2) 3 shavew indow(τ, w) EW τ w 3. 1 4 TAGME 1 [10] 1 Mihalcea [11] Keyphraseness 3 15 6 13 3. 3 5 Keyphraseness 5. 4. 1 TEAM 1 TEAM 5. 2 2 TEAM 5. 3 3 w 5. 4 5. 1 5. 2, 5. 3, 5. 4 5. 5 5. 1 WWW2016 #Microposts2016 4 NEEL(Named Entity recognition and Linking) Challenge 2011 2015 Twitter Firehose 100 3,164 6,025 9,289 9,289 2016 12 26 Twitter REST API 5 1,486 DBpedia 6 SPARQL1.1 7 DBpedia Wikipedia Wikipedia 2016 12 1 8 Twitter 2011 2015 2 2 tweets2011 tweets2015 (1) (2) (3) 1 4 http://microposts2016.seas.upenn.edu/index.html 5 https://dev.twitter.com/rest/public 6 http://wiki.dbpedia.org/ 7 https://www.w3.org/tr/sparql11-query/ 8 https://dumps.wikimedia.org/enwiki/

1: tweets2011 tweets2015 689 797 636 70 1.97 / 0.16 / / 9 / 5 / 0 / 0 / 17.42 13.03 2011 7 16 2015 12 15 8 15 12 16 TEAM-jaccard (a) tweets2011 TEAM-gsm TEAM-jaccard TEAM-gsm (a) tweets2011 (b) tweets2015 3: TEAM x y tweets2011 tweets2015 tweets2011 TEAM-gsm (b) tweets2015 4: TEAM x y 15 λ WLM TEAM TEAM TEAM TEAM TAGME-kp [10] ϵ TAGME [3] ϵ = 30(%) 5 ρ 0 1 0.5 PR 5. 2 TEAM TEAM 4. 1 TEAM-jaccard, TEAM-dice, TEAMnpmi, TEAM-gsm PR λ = 0.5, w = 6hours 3 x y tweets2011 tweets2015 tweets2015 1 tweets2011 TEAM-GSM a, b TEAM-jaccard, TEAM-dice, TEAM-npmi 0 TEAMgsm TEAM-jaccard TEAM-gsm 2 5. 3 TEAM TEAM 15 λ 0 1 0.1 PR λ = 0 w = 6hours 4 15 λ WLM TEAM x y tweets2011 λ = 0 PR-AUC 0.20 λ = 1 TEAM-jaccard PR-AUC 0.32 TEAM-gsm PR-AUC 0.28 PR-AUC 60% TEAM tweets2011 TEAM-jaccard, TEAM-gsm λ = 1.0 PR-AUC PR λ = 1.0 15 TEAM WLM TEAM λ

TEAM-jaccard TEAM-gsm (a) tweets2011 TEAM-jaccard TEAM-gsm (b) tweets2015 5: x y tweets2011 w = 1day ρ tweets2015 5. 2 5. 2 1 5. 4 w w w 0 1day 0 1month 2 w = 0 (None) WLM TEAM λ = 0.5 5 tweets2015 tweets2011 TEAM-jaccard TEAM-gsm w = 1day PR 15 TEAM w 1day Twitter 5. 5 tweets2011 TEAM 5. 3 TEAM λ w = 1day Twitter TEAM-jaccard TAGME+ TEAM-gsm TAGME+ TEAM-jaccard TAGME+ tweets2015 TEAM tweets2015 1 6. (1)140 (2) (3) Twitter Liu [12]

Shen [13] KAURI KAURI KAURI Twitter Fang [14] Twitter GPS 1.6% [15] Fang 25% Fang Hua [16] (1) (2) Hua Twitter Wikipedia Hua 7. TEAM TEAM TEAM Twitter 2. 1 k tweets2015 1 [1] G. Rizzo, M. van Erp, and R. Troncy. Benchmarking the extraction and disambiguation of named entities on the semantic web. In Proc. 9th International Conference on Language Resources and Evaluation, LREC 14, Reykjavik, Iceland, 2014. [2] W. Shen, J. Wang, and J. Han. Entity linking with a knowledge base: Issues, techniques, and solutions. IEEE Transactions on Knowledge and Data Engineering, Vol. 27, No. 2, pp. 443 460, 2015. [3] P. Ferragina and U. Scaiella. Fast and accurate annotation of short texts with wikipedia pages. IEEE Software, Vol. 1, No. 29, pp. 70 75, 2012. [4] I. Witten and D. Milne. An effective, low-cost measure of semantic relatedness obtained from wikipedia links. In Proc. AAAI Workshop on Wikipedia and Artificial Intelligence: an Evolving Synergy, AAAI Press, pp. 25 30, Chicago, USA, 2008. [5] G. Bouma. Normalized (pointwise) mutual information in collocation extraction. In Proc. the Biennial GSCL Conference, Vol. 156, 2009. [6] D. Bollegala, Y. Matsuo, and M. Ishizuka. Measuring semantic similarity between words using web search engines. In Proc. the 16th International Conference on World Wide Web, WWW 07, pp. 757 766, New York, USA, 2007. [7] D. Bollegala, Y. Matsuo, and M. Ishizuka. A web search engine-based approach to measure semantic similarity between words. IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 7, pp. 977 990, 2011. [8] G. Lu, P. Huang, L. He, C. Cu, and X. Li. A new semantic similarity measuring method based on web search engines. W. Trans. on Comp., Vol. 9, No. 1, pp. 1 10, 2010. [9] R. L Cilibrasi and P. MB Vitanyi. The google similarity distance. IEEE Transactions on knowledge and data engineering, Vol. 19, No. 3, 2007. [10],,,. 7, 2015. [11] R. Mihalcea and A. Csomai. Wikify!: linking documents to encyclopedic knowledge. In Proc. the 16th ACM conference on information and knowledge management, pp. 233 242, 2007. [12] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. Entity linking for tweets. In ACL (1), pp. 1304 1311, 2013. [13] W. Shen, J. Wang, P. Luo, and M. Wang. Linking named entities in tweets with knowledge base via user interest modeling. In Proc. the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 68 76, 2013. [14] Y. Fang and M. Chang. Entity linking on microblogs with spatial and temporal signals. Transactions of the Association for Computational Linguistics, Vol. 2, pp. 259 272, 2014. [15] K. Leetaru, S. Wang, G. Cao, A. Padmanabhan, and E. Shook. Mapping the global twitter heartbeat: The geography of twitter. First Monday, Vol. 18, No. 5, 2013. [16] W. Hua, K. Zheng, and X. Zhou. Microblog entity linking with social temporal context. In Proc. the 2015 ACM SIG- MOD International Conference on Management of Data, pp. 1761 1775, 2015. NICT