Εισαγωγή στην ανάλυση συνδέσμων

Σχετικά έγγραφα
The Simply Typed Lambda Calculus

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

TMA4115 Matematikk 3

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

Areas and Lengths in Polar Coordinates

Section 8.3 Trigonometric Equations

2 Composition. Invertible Mappings

Math 6 SL Probability Distributions Practice Test Mark Scheme

Areas and Lengths in Polar Coordinates

Matrices and Determinants

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Other Test Constructions: Likelihood Ratio & Bayes Tests

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Homework 3 Solutions

5.4 The Poisson Distribution.

derivation of the Laplacian from rectangular to spherical coordinates

Approximation of distance between locations on earth given by latitude and longitude

EE512: Error Control Coding

Section 9.2 Polar Equations and Graphs

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Code Breaker. TEACHER s NOTES

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

Instruction Execution Times

Solutions to Exercise Sheet 5

Block Ciphers Modes. Ramki Thurimella

Section 7.6 Double and Half Angle Formulas

Μηχανική Μάθηση Hypothesis Testing

Numerical Analysis FMN011

ST5224: Advanced Statistical Theory II

9.09. # 1. Area inside the oval limaçon r = cos θ. To graph, start with θ = 0 so r = 6. Compute dr

Fractional Colorings and Zykov Products of graphs

A Bonus-Malus System as a Markov Set-Chain. Małgorzata Niemiec Warsaw School of Economics Institute of Econometrics

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Συστήματα Διαχείρισης Βάσεων Δεδομένων

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

ANSWERSHEET (TOPIC = DIFFERENTIAL CALCULUS) COLLECTION #2. h 0 h h 0 h h 0 ( ) g k = g 0 + g 1 + g g 2009 =?

Right Rear Door. Let's now finish the door hinge saga with the right rear door

PARTIAL NOTES for 6.1 Trigonometric Identities

Statistical Inference I Locally most powerful tests

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

Web Searching I: History and Basic Notions, Crawling II: Link Analysis Techniques III: Web Spam Page Identification

Main source: "Discrete-time systems and computer control" by Α. ΣΚΟΔΡΑΣ ΨΗΦΙΑΚΟΣ ΕΛΕΓΧΟΣ ΔΙΑΛΕΞΗ 4 ΔΙΑΦΑΝΕΙΑ 1

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Mean bond enthalpy Standard enthalpy of formation Bond N H N N N N H O O O

CE 530 Molecular Simulation

Εργαστήριο Ανάπτυξης Εφαρμογών Βάσεων Δεδομένων. Εξάμηνο 7 ο

Example Sheet 3 Solutions

Example of the Baum-Welch Algorithm

Homework 8 Model Solution Section

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

C.S. 430 Assignment 6, Sample Solutions

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

Concrete Mathematics Exercises from 30 September 2016

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Δημιουργία Λογαριασμού Διαχείρισης Business Telephony Create a Management Account for Business Telephony

The challenges of non-stable predicates

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Πώς μπορεί κανείς να έχει έναν διερμηνέα κατά την επίσκεψή του στον Οικογενειακό του Γιατρό στο Ίσλινγκτον Getting an interpreter when you visit your

How to register an account with the Hellenic Community of Sheffield.

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Inverse trigonometric functions & General Solution of Trigonometric Equations

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Capacitors - Capacitance, Charge and Potential Difference

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Models for Probabilistic Programs with an Adversary

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Potential Dividers. 46 minutes. 46 marks. Page 1 of 11

Overview. Transition Semantics. Configurations and the transition relation. Executions and computation

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Calculating the propagation delay of coaxial cable

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Galatia SIL Keyboard Information

EPL 603 TOPICS IN SOFTWARE ENGINEERING. Lab 5: Component Adaptation Environment (COPE)

[1] P Q. Fig. 3.1

Modbus basic setup notes for IO-Link AL1xxx Master Block

(C) 2010 Pearson Education, Inc. All rights reserved.

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 7η: Consumer Behavior Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

( ) 2 and compare to M.

ES440/ES911: CFD. Chapter 5. Solution of Linear Equation Systems

Srednicki Chapter 55

Business English. Ενότητα # 9: Financial Planning. Ευαγγελία Κουτσογιάννη Τμήμα Διοίκησης Επιχειρήσεων

Solution Series 9. i=1 x i and i=1 x i.

4.6 Autoregressive Moving Average Model ARMA(1,1)

Η θέση ύπνου του βρέφους και η σχέση της με το Σύνδρομο του αιφνίδιου βρεφικού θανάτου. ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΣΧΟΛΗ ΕΠΙΣΤΗΜΩΝ ΥΓΕΙΑΣ

Math221: HW# 1 solutions

(1) Describe the process by which mercury atoms become excited in a fluorescent tube (3)

6.3 Forecasting ARMA processes

Section 8.2 Graphs of Polar Equations

Αλγόριθμοι και πολυπλοκότητα NP-Completeness (2)

Uniform Convergence of Fourier Series Michael Taylor

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Problem Set 3: Solutions

Transcript:

Εισαγωγή στην ανάλυση συνδέσμων Αποθήκες και Εξόρυξη Δεδομένων Διδάσκων: Μαρία Χαλκίδη

Why link analysis? Why link analysis? The web is not just a collection of documents its hyperlinks are important! A link from page A to page B may indicate: A is related to B, or A is recommending, citing or endorsing B Links are either referential click here and get back home, or Informational click here to get more detail Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 2

Citation Analysis The impact factor of a journal = A/B A is the number of current year citations to articles appearing in the journal during previous two years. B is the number of articles published in the journal during previous two years. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 3

Co-Citation A and B are co-cited by C, implying that they are related or associated. The strength of co-citation between A and B is the number of times they are co-cited. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 4

Clusters from Co-Citation Graph Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 5

What is a Markov Chain? A Markov chain has two components: A network structure much like a web site, where each node is called a state. A transition probability of traversing a link given that the chain is in a state. For each state the sum of outgoing probabilities is one. A sequence of steps through the chain is called a random walk. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 6

Markov Chain Example Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 7

The Random Surfer Assume the web is a Markov chain. Surfers randomly click on links, where the probability of an outlink from page A is 1/m, where m is the number of outlinks from A. The surfer occasionally gets bored and is teleported to another web page, say B, where B is equally likely to be any page. Using the theory of Markov chains it can be shown that if the surfer follows links for long enough, the PageRank of a web page is the probability that the surfer will visit that page. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 8

Dangling Pages B Problem: A and B have no outlinks. Solution: Assume A and B have links to all web pages with equal probability. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 9

Rank Sink Problem: Pages in a loop accumulate rank but do not distribute it. Solution: Teleportation, i.e. with a certain probability the surfer can jump to any other web page to get out of the loop. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 10

PageRank Overview Out-link from A Dangling link A B C D In-link from B Method to rank web pages giving to it a numeric value that represent their importance Based on the link structure of the web A page X has a high rank if: - It has many In-links or few but highly ranked - Has few Out-links Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 11

Google s PageRank Assumption the prestige of a page is proportional to the sum of the prestige scores of pages linking to it Rank of a web page depends on the rank of the web pages pointing to it P is a web page P i are the web pages that have a link to P O(P i ) is the number of outlinks from P i d is the teleportation probability N is the size of the web Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 12

Example Web Graph Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 13

Iteratively Computing PageRank Replace d/n in the def. of PR(P) by d, so PR will take values between 1 and N. d is normally set to 0.15, but for simplicity lets set it to 0.5 Set initial PR values to 1 Solve the following equations iteratively: Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 14

Personalised PageRank Change d/n with dv Instead of teleporting uniformly to any page we bias the jump prefer some pages over others. E.g. v has 1 for your home page and 0 otherwise. E.g. v prefers the topics you are interested in. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 15

HITS Hubs and Authorities - Hyperlink-Induced Topic Search A on the left is an authority A on the right is a hub Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 17

Hubs and Authorities Key ideas Hubs and authorities are fans and centers in a bipartite core of a web graph A good hub page is one that points to many good authority pages A good authority page is one that is pointed to by many good hub pages Relationship between hubs and authorities helps the mining of authoritative web pages and automated discovery of high-quality web structures and resources Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 18

Communities on the Web A densely linked focused sub-graph of hubs and authorities is called a community. Over 100,000 emerging web communities have been discovered from a web crawl (a process called trawling). Alternatively, a community is a set of web pages W having at least as many links to pages in W as to pages outside W. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 19

Pre-processing for HITS Collect the top t pages (say t = 200) based on the input query; call this the root set. Extend the root set into a base set as follows, for all pages p in the root set: add to the root set all pages that p points to, and add to the root set up-to q pages that point to p (say q = 50). Delete all links within the same web site in the base set resulting in a focused sub-graph. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 20

Expanding the Root Set Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 21

HITS Algorithm Iterate until Convergence Authority weight Hub weight B is the base set q and p are web pages in B A(p) is the authority score for p H(p) is the hub score for p Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 22

HITS Algorithm output HITS outputs a short list of the pages with large hub weights pages with large authority weights for the given search topic Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 23

Applications of HITS Search engine querying (speed an issue) Finding web communities. Finding related pages. Populating categories in web directories. Citation analysis. Αποθήκες και Εξόρυξη Δεδομένων, Παν. Πειραιώς 24