Σχετικά έγγραφα

Shortness Ambiguity TEAM Ungrammaticality

Web. Web p OutDegree(p) log 7 1/OutDegree(p) A New Difinition of Subjective Distance between Web Pages

Twitter 6. DEIM Forum 2014 A Twitter,,, Wikipedia, Explicit Semantic Analysis,

Web DEIM Forum 2009 A7-1. Web. Web. Web. Web. 4 Wikipedia. Wikipedia. Web.

Topic Structure Mining based on Wikipedia and Web Search


ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697


User Behavior Analysis for a Large2scale Search Engine

MIDI [8] MIDI. [9] Hsu [1], [2] [10] Salamon [11] [5] Song [6] Sony, Minato, Tokyo , Japan a) b)

2. N-gram IDF. DEIM Forum 2016 A1-1. N-gram IDF IDF. 5 N-gram. N-gram. N-gram. N-gram IDF.

ΠΑΝΕΠΙΣΤΗΜΙΟ ΜΑΚΕΔΟΝΙΑΣ

Αναζήτηση στο ιαδίκτυο

{takasu, Conditional Random Field

HOSVD. Higher Order Data Classification Method with Autocorrelation Matrix Correcting on HOSVD. Junichi MORIGAKI and Kaoru KATAYAMA

Newman Modularity Newman [4], [5] Newman Q Q Q greedy algorithm[6] Newman Newman Q 1 Tabu Search[7] Newman Newman Newman Q Newman 1 2 Newman 3

MDSR. Proposition and Evaluation of MDSR Method for Core Analysis of Multiple Directed Graphs

HTML Utilizing Similarities of HTML Structures in Splog Detection by Machine Learning

Περιεχόμενα. Εξόρυξη γνώσης από δεδομένα στον Παγκόσμιο Ιστό

Vol. 31,No JOURNAL OF CHINA UNIVERSITY OF SCIENCE AND TECHNOLOGY Feb


Anomaly Detection with Neighborhood Preservation Principle

Semantic Drift in Espresso-style Bootstrapping: Graph-theoretic Analysis and Evaluation in Word Sense Disambiguation

EM Baum-Welch. Step by Step the Baum-Welch Algorithm and its Application 2. HMM Baum-Welch. Baum-Welch. Baum-Welch Baum-Welch.

A research on the influence of dummy activity on float in an AOA network and its amendments

Kenta OKU and Fumio HATTORI

Ερευνητική+Ομάδα+Τεχνολογιών+ Διαδικτύου+

An Effective and Efficient Algorithm for Text Categorization


2016 IEEE/ACM International Conference on Mobile Software Engineering and Systems

Γραµµική Αλγεβρα. Ενότητα 1 : Εισαγωγή στη Γραµµική Αλγεβρα. Ευστράτιος Γαλλόπουλος Τµήµα Μηχανικών Η/Υ & Πληροφορικής

DIRM : DIRM :A Model for Data Query Based on Dynamic Information Route Approach


Web Mining. Χριστίνα Αραβαντινού Ιούνιος 2014

Social Web: lesson #4

ΕΥΡΕΣΗ ΤΟΥ ΔΙΑΝΥΣΜΑΤΟΣ ΘΕΣΗΣ ΚΙΝΟΥΜΕΝΟΥ ΡΟΜΠΟΤ ΜΕ ΜΟΝΟΦΘΑΛΜΟ ΣΥΣΤΗΜΑ ΟΡΑΣΗΣ

Probabilistic Approach to Robust Optimization

Τ.Ε.Ι. Δυτικής Ελλάδας Τμήμα Διοίκησης Επιχειρήσεων Μεσολόγγι. 5 η Διάλεξη. Μάθημα: Τεχνολογίες Διαδικτύου

Automatic extraction of bibliography with machine learning

A Method for Creating Shortcut Links by Considering Popularity of Contents in Structured P2P Networks

Determination of Topic Description Terms in Topic Model

No. 7 Modular Machine Tool & Automatic Manufacturing Technique. Jul TH166 TG659 A

ELIXIR-GR / BiP! Finder

Reading Order Detection for Text Layout Excluded by Image

Wavelet based matrix compression for boundary integral equations on complex geometries

ER-Tree (Extended R*-Tree)

ΓΛΩΣΣΙΚΗ ΤΕΧΝΟΛΟΓΙΑ. Μάθημα 11 ο : Αυτόματη παραγωγή περιλήψεων. Γεώργιος Πετάσης. Ακαδημαϊκό Έτος:

Ανάκτηση Πληροφορίας

Schedulability Analysis Algorithm for Timing Constraint Workflow Models

Ανάλυση σχημάτων βασισμένη σε μεθόδους αναζήτησης ομοιότητας υποακολουθιών (C589)

SocialDict. A reading support tool with prediction capability and its extension to readability measurement

Ανάλυση Προτιμήσεων για τη Χρήση Συστήματος Κοινόχρηστων Ποδηλάτων στην Αθήνα

HTML. Evaluating Effects of Similarities of HTML Structures in Splog Detection

Analyzing Similarity of HTML Structures and Affiliate in Automatic Collection of Splogs

Web 論 文. Performance Evaluation and Renewal of Department s Official Web Site. Akira TAKAHASHI and Kenji KAMIMURA

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

Exhaustive Topic Detection and Query Expansion Support Based on Substance-Oriented Term Clustering

Τ.Ε.Ι. Δυτικής Ελλάδας Τμήμα Διοίκησης Επιχειρήσεων Μεσολόγγι. 5 η Διάλεξη. Μάθημα: Τεχνολογίες Διαδικτύου

1 (forward modeling) 2 (data-driven modeling) e- Quest EnergyPlus DeST 1.1. {X t } ARMA. S.Sp. Pappas [4]

X g 1990 g PSRB

Bizagi Modeler: Συνοπτικός Οδηγός

Supporting Information

Research on model of early2warning of enterprise crisis based on entropy

Δημιουργία μιας επιτυχημένης παρουσίας στο διαδίκτυο

1 n-gram n-gram n-gram [11], [15] n-best [16] n-gram. n-gram. 1,a) Graham Neubig 1,b) Sakriani Sakti 1,c) 1,d) 1,e)

How to register an account with the Hellenic Community of Sheffield.

Study on the Strengthen Method of Masonry Structure by Steel Truss for Collapse Prevention

Indexing Methods for Encrypted Vector Databases

Research on Economics and Management

Πώς λειτουργεί το Google?

Evaluation of Methods to Extract Important Scenes for Automatic Digest Generation from a Presentation Video

: Monte Carlo EM 313, Louis (1982) EM, EM Newton-Raphson, /. EM, 2 Monte Carlo EM Newton-Raphson, Monte Carlo EM, Monte Carlo EM, /. 3, Monte Carlo EM

ΠΑΡΟΥΣΙΑΣΗ ΠΤΥΧΙΑΚΗΣ ΕΡΓΑΣΙΑΣ ΒΕΛΤΙΣΤΟΠΟΙΗΣΗ ΙΣΤΟΣΕΛΙΔΩΝ ΓΙΑ ΤΙΣ ΜΗΧΑΝΕΣ ΑΝΑΖΗΤΗΣΗΣ, ΠΟΙΟΤΙΚΗ ΕΡΕΥΝΑ ΣΕ ΕΙΔΙΚΟΥΣ SEO

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

Toward a SPARQL Query Execution Mechanism using Dynamic Mapping Adaptation -A Preliminary Report- Takuya Adachi 1 Naoki Fukuta 2.

ADVANCED STRUCTURAL MECHANICS

High order interpolation function for surface contact problem

Elements of Information Theory

Development of Finer Spray Atomization for Fuel Injectors of Gasoline Engines

substructure similarity search using features in graph databases

1) Abstract (To be organized as: background, aim, workpackages, expected results) (300 words max) Το όριο λέξεων θα είναι ελαστικό.

Δικτυακά Πολυμέσα ΙΙ Διάλεξη #7 η : Μηχανές αναζήτησης: λειτουργία, αξιολόγηση. Γαβαλάς Δαμιανός

ΔΙΠΛΩΜΑΤΙΚΕΣ ΕΡΓΑΣΙΕΣ ΠΜΣ «ΠΛΗΡΟΦΟΡΙΚΗ & ΕΠΙΚΟΙΝΩΝΙΕς» OSWINDS RESEARCH GROUP

Supplementary Materials for Evolutionary Multiobjective Optimization Based Multimodal Optimization: Fitness Landscape Approximation and Peak Detection

Τ.Ε.Ι. ΔΥΤΙΚΗΣ ΜΑΚΕΔΟΝΙΑΣ ΠΑΡΑΡΤΗΜΑ ΚΑΣΤΟΡΙΑΣ ΤΜΗΜΑ ΔΗΜΟΣΙΩΝ ΣΧΕΣΕΩΝ & ΕΠΙΚΟΙΝΩΝΙΑΣ

Buried Markov Model Pairwise

Ανάκτηση Πληροφορίας. Διδάσκων: Φοίβος Μυλωνάς. Διάλεξη #03

Ένα µοντέλο Ισοδύναµης Χωρητικότητας για IEEE Ασύρµατα Δίκτυα. Εµµανουήλ Καφετζάκης

Real time mobile robot control with a multiresolution map representation

Feasible Regions Defined by Stability Constraints Based on the Argument Principle

AΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΠΟΛΙΤΙΚΩΝ ΜΗΧΑΝΙΚΩΝ

Εισαγωγή στην ανάλυση συνδέσμων

Ανάκτηση Πληροφορίας

Ανάκτηση Πληροφορίας

Gemini, FastMap, Applications. Εαρινό Εξάμηνο Τμήμα Μηχανικών Η/Υ και Πληροϕορικής Πολυτεχνική Σχολή, Πανεπιστήμιο Πατρών

Προσομοίωση BP με το Bizagi Modeler

[4] 1.2 [5] Bayesian Approach min-max min-max [6] UCB(Upper Confidence Bound ) UCT [7] [1] ( ) Amazons[8] Lines of Action(LOA)[4] Winands [4] 1

Yoshifumi Moriyama 1,a) Ichiro Iimura 2,b) Tomotsugu Ohno 1,c) Shigeru Nakayama 3,d)

OLS. University of New South Wales, Australia

Transcript:

DEIM Forum 2014 A8-1, 606 8501 E-mail: {tsukuda,ohshima,kato,tanaka}@dl.kuis.kyoto-u.ac.jp 1 2,, 1. Google 1 Yahoo 2 Bing 3 Web Web BM25 [1] HITS [2] PageRank [3] Web 1 [4] 1http://www.google.com 2http://www.yahoo.com 3http://www.bing.com 4 4 5 5 O = {t o1, t o2,, t on } O t a O t a t oi O t a

[5] [6] [7] [8] [9] [10] [11] 5 25 2. 3. 4. 5. 6. 7. 2. 2. 1 [12] [13] [14] [15]Noda [15] Wikipedia 4 (Wikipedia 1 2) 3 Wikipedia 2 1 Wikipedia Nadamoto [14] SNS 1 Wikipedia Wikipedia Liu [12] 2 Web 4http://ja.wikipedia.org/ Web Web Web Mejova [13] 1 2. 2 WebJaccard WebDiceWebOverlap NGDWWebPMI Web Bollegala [6] 2 [7] 1 Web 1 [8] Wikipedia [9] [10] [11]Strube [9] Wikipedia 2 2 Gabrilovich [10] Wikipedia Milne [11] 2 Wikipedia Wikipedia Wikipedia Wikipedia 2 Milne Strube Gabrilovich Gabrilovich 3. (t 1, t 2) t 1 t 2 c c T c = {t o1, t o2,, t on, } t a t oi T c t a t oi T c c T c t a t oi T c t a 1.

3. 1 1 3. 2 2 4. 3. 2 4. 1 4. 2 4. 3 4. 4 4. 1 3 4. 1. 1 WLM 1 Milne [11] t 1 t 2 Wikipedia t 1 t 2 t 1 t 2 TF-IDF s t s t ( ) W w(s t) = log if s B t, 0 otherwise. (1) B t B t t W Wikipedia t 1 t 2 1 s t 1 t 2 t 1 t 2 F t1 F t2 t F t1 F t2 t 1 t 2 sim forward (t 1, t 2) t 1 t 2 sim backward (t 1, t 2 ) = log (max ( B t 1, B t2 )) log ( B t1 B t2 ). log ( W ) log (min ( B t1, B t2 )) (2) 2 t 1 t 2 wlm(t 1, t 2 ) wlm(t 1, t 2) = 1 2 sim forward(t 1, t 2) + 1 2 sim backward(t 1, t 2). 4. 1. 2 WebPMI (3) 2 Web 2 WebPMI [6] [5] WebPMI t 1 t 2 0 if hit(t 1, t 2 ) < = c web pmi(t 1, t 2 ) = ( ) hit(t log 1 t 2 )/N 2 otherwise. (hit(t 1 )/N) (hit(t 2 )/N) hit(t) t Web Clue Web 09 5 Bollegala [6] c = 5 N Web Clue Web 09 5http://lemurproject.org/clueweb09/ (4)

N = 67, 337, 717 WebJaccard WebDiceWebOverlap NGD WebPMI [5], [6] 4. 1. 3 Noda 3 Web 2 [16] t 1 t 2 noda(t 1, t 2 ) = hit(t1 t2)2 hit(t 1 ) hit(t 2 ). (5) WebPMI 2 4. 2 3. 1 1 t oi t a rel pop(t oi, t a ) rel pop(t oi, t a ) = rel(t oi, t a ) pop(t oi ) (6) rel(t oi, t a) 4. 1 pop(t oi ) Wikipedia t oi t oi Wikipedia [17] 4. 3 4. 3. 1 t oi t oi [18] [19] ALAGIN 6 Wikipedia 20 245 t oi t oi 1 HITS [2] t oi t oi t oj coordinate(t oi, t oj ) 4. 3. 2 3. 2 2 6https://alaginrc.nict.go.jp/hyponymy/index.html biased PageRank [20] biased PageRank biased PageRank 1 2 1 2 2 2 Biased PageRank T c = {t o1, t o2,, t on } (i, j) coordinate(t oj, t oi ) n 1 (i, j) a i,j a i,j = coordinate(t oj, t oi ) t ok T c coordinate(t oj, t ok ). (7) t oi t a rel crd(t oi, t a) rel crd l+1 (t oi, t a ) =α 1< = j< = n + (1 α) a i,j rel crd l (t oi, t a ) rel(t oi, t a ) t oj T c rel(t oj, t a). 8 1 2 biased PageRank 12α 0 < = α < = 1 α α (8) 4. 4 t oi t a biased PageRank 4. 2 rel pop crd l+1 (t oi, t a ) =α 5. 1< = j< = n + (1 α) a i,j rel pop crd l (t oi, t a ) rel pop (t oi, t a ) t oj T c rel pop(t oj, t a). (9) 2 1

1 169 117 C 155 481 157 MVP 2 40 39 13 30 32 2 5. 1 5 5 4. 1. 1 WLM 4. 3. 1 Wikipedia 1 Wikipedia 2008 6 24 Wikipedia 1 W 1, 342, 098 5. 2 4. 7 Wikipedia 40 1 10 10 5 5 2 1 2 7http://www.lancers.jp/ 1 Web 1 1 20 20 5. 3 4. 1 3 WLM WebPMI Noda 4. 2 WLM+pop WebPMI+pop Noda+pop 4. 3 WLM+BPR WebPMI+BPR Noda+BPR 4. 4 WLM+pop+BPR WebPMI+pop+BPR Noda+pop+BPR Biased PageRank 89 0.1 0.9 0.1 5. 2 5. 4 5. 3 2

2 α ****** original 10%5%1% pop pop+bpr BPR pop+bpr 3 Biased PageRank original WLM WebPMI Noda Biased PageRank WLM WebPMI Noda 2 pop 2 BPR 3. 1 2 WLM Noda 2 pop+bpr 1 biased PageRank 6

3 Noda Noda+BPR α = 0.9 1 1 1 2 2 2 3 7 3 4 3 4 5 6 6 6-13 7-8 8 8 10 9 - - 10 13 17 0.457 0.698 4 WebPMI WebPMI+pop+BPR α = 0.9 1 9 1 2 8 3 3 4 4 4 2 2 5 5 5 6 6 10 7-7 8 5 6 9 3 11 10 1 9 0.400 0.714 3 3 3 Noda Noda+BPR α = 0.9 10 - Noda 10 Noda Noda+BPR 17 26 4 4 5 1 { } 55 { } 3 { } 136 { } 1 { } 28 { } 1 { } 190 { } WebPMI WebPMI+pop+BPR α = 0.9 10 5 5 Bollegala [6] Gabrilovich [10] 6. 6. 1 c m t a c m t a c m O m = {o 1, o 2,, o m} t a unexp(c, O m, t a) = (1 rel pop cred(o i, t a)) (10) o i O m m = 2 rel pop cred(o i, t a) 5. Noda+pop+BPR 0.5 5 5

6. 2 2 1 2 { } 2 { } 10 Wikipedia 2 Wikipedia 7. 1 2 5 25 2 242400132468000812J03993 [1] S. E. Robertson and S. Walker: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval, Proc. of ACM SIGIR 1994, pp. 232 241 (1994). [2] J. M. Kleinberg: Authoritative sources in a hyperlinked environment, J. ACM, 46, pp. 604 632 (1999). [3] S. Brin and L. Page: The anatomy of a large-scale hypertextual web search engine, Proc. of WWW 2007, pp. 107 117 (1998). [4] K. Tsukuda, H. Ohshima, M. Yamamoto, H. Iwasaki and K. Tanaka: Discovering unexpected information on the basis of popularity/unpopularity analysis of coordinate objects and their relationships, Proc. of SAC 2013, pp. 878 885 (2013). [5] G. Lu, P. Huang, L. He, C. Cu and X. Li: A new semantic similarity measuring method based on web search engines, W. Trans. on Comp., 9, 1, pp. 1 10 (2010). [6] D. Bollegala, Y. Matsuo and M. Ishizuka: A web search engine-based approach to measure semantic similarity between words, IEEE Trans. on Knowl. and Data Eng., 23, 7, pp. 977 990 (2011). [7] M. Sahami and T. D. Heilman: A web-based kernel function for measuring the similarity of short text snippets, Proc. of WWW 2006, pp. 377 386 (2006). [8] H.-H. Chen, M.-S. Lin and Y.-C. Wei: Novel association measures using web search with double checking, Proc. of ACL-44, pp. 1009 1016 (2006). [9] M. Strube and S. P. Ponzetto: Wikirelate! computing semantic relatedness using wikipedia, Proc. of AAAI 2006, pp. 1419 1424 (2006). [10] E. Gabrilovich and S. Markovitch: Computing semantic relatedness using wikipedia-based explicit semantic analysis, Proc. of IJCAI 2007, pp. 1606 1611 (2007). [11] D. Milne and I. Witten: An effective, low-cost measure of semantic relatedness obtained from wikipedia links, Proc. of WIKIAI 2008, pp. 25 30 (2008). [12] B. Liu, Y. Ma and P. S. Yu: Discovering unexpected information from your competitors web sites, Proc. of ACM SIGKDD 2001, pp. 144 153 (2001). [13] Y. Mejova, I. Bordino, M. Lalmas and A. Gionis: Searching for interestingness in wikipedia and yahoo!: answers, Proc. of WWW 2013, pp. 145 146 (2013). [14] A. Nadamoto, E. Aramaki, T. Abekawa and Y. Murakami: Content hole search in community-type content, Proc. of WWW 2009, pp. 1223 1224 (2009). [15] Y. Noda, Y. Kiyota and H. Nakagawa: Discovering serendipitous information from wikipedia by using its network structure, Proc. of ICWSM 2010, pp. 299 302 (2010). [16],,,, web, DBSJ Letters, 5, 2, pp. 69 72 (2006). [17],,,, wikipedia, p. to appear (2013). [18],, Web,., 47, 19, pp. 98 112 (2006). [19],,, Web (WebDB Forum 2013) (2013). [20] T. H. Haveliwala: Topic-sensitive pagerank, Proc. of WWW 2002, pp. 517 526 (2002).