HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems. Web Searching



Σχετικά έγγραφα
HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems. Web Searching ΙΙΙ

Oscillatory integrals

Solutions 3. February 2, Apply composite Simpson s rule with m = 1, 2, 4 panels to approximate the integrals:

ΤΜΗΜΑ ΗΛΕΚΤΡΟΛΟΓΩΝ ΜΗΧΑΝΙΚΩΝ ΚΑΙ ΜΗΧΑΝΙΚΩΝ ΥΠΟΛΟΓΙΣΤΩΝ

Σχολή Εφαρμοσμένων Μαθηματικών και Φυσικών Επιστημών. Εθνικό Μετσόβιο Πολυτεχνείο. Thales Workshop, 1-3 July 2015.

Solutions_3. 1 Exercise Exercise January 26, 2017

Optimal Placing of Crop Circles in a Rectangle

2 Composition. Invertible Mappings

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

HΥ463 - Συστήματα Ανάκτησης Πληροφοριών Information Retrieval (IR) Systems. Web Searching

Web Searching ΙΙ Τεχνικές Ανάλυσης Συνδέσμων (Link Analysis Techniques)

Example 1: THE ELECTRIC DIPOLE

To find the relationships between the coefficients in the original equation and the roots, we have to use a different technique.

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 6/5/2006

Approximation of distance between locations on earth given by latitude and longitude

Analytical Expression for Hessian

Matrix Hartree-Fock Equations for a Closed Shell System

(a,b) Let s review the general definitions of trig functions first. (See back cover of your book) sin θ = b/r cos θ = a/r tan θ = b/a, a 0

Instruction Execution Times

Example Sheet 3 Solutions

CHAPTER (2) Electric Charges, Electric Charge Densities and Electric Field Intensity

The Simply Typed Lambda Calculus

Math221: HW# 1 solutions

Finite Field Problems: Solutions

Math 6 SL Probability Distributions Practice Test Mark Scheme

Section 9.2 Polar Equations and Graphs

derivation of the Laplacian from rectangular to spherical coordinates

Section 8.3 Trigonometric Equations

Physics 505 Fall 2005 Practice Midterm Solutions. The midterm will be a 120 minute open book, open notes exam. Do all three problems.

Section 7.6 Double and Half Angle Formulas

Assalamu `alaikum wr. wb.

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Space Physics (I) [AP-3044] Lecture 1 by Ling-Hsiao Lyu Oct Lecture 1. Dipole Magnetic Field and Equations of Magnetic Field Lines

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Fractional Colorings and Zykov Products of graphs

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

ΑΡΙΣΤΟΤΕΛΕΙΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΘΕΣΣΑΛΟΝΙΚΗΣ

Inverse trigonometric functions & General Solution of Trigonometric Equations

Practice Exam 2. Conceptual Questions. 1. State a Basic identity and then verify it. (a) Identity: Solution: One identity is csc(θ) = 1

Chapter 3: Ordinal Numbers

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

I Feel Pretty VOIX. MARIA et Trois Filles - N 12. BERNSTEIN Leonard Adaptation F. Pissaloux. ι œ. % α α α œ % α α α œ. œ œ œ. œ œ œ œ. œ œ. œ œ ƒ.

EE512: Error Control Coding

Block Ciphers Modes. Ramki Thurimella

Ψηφιακή ανάπτυξη. Course Unit #1 : Κατανοώντας τις βασικές σύγχρονες ψηφιακές αρχές Thematic Unit #1 : Τεχνολογίες Web και CMS

6.3 Forecasting ARMA processes

Strain gauge and rosettes

Reminders: linear functions

Πρόβλημα 1: Αναζήτηση Ελάχιστης/Μέγιστης Τιμής

C.S. 430 Assignment 6, Sample Solutions

Statistical Inference I Locally most powerful tests

The challenges of non-stable predicates

Fourier Series. MATH 211, Calculus II. J. Robert Buchanan. Spring Department of Mathematics

Areas and Lengths in Polar Coordinates

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 24/3/2007

ST5224: Advanced Statistical Theory II

UNIVERSITY OF CALIFORNIA. EECS 150 Fall ) You are implementing an 4:1 Multiplexer that has the following specifications:

Dynamic types, Lambda calculus machines Section and Practice Problems Apr 21 22, 2016

Congruence Classes of Invertible Matrices of Order 3 over F 2

department listing department name αχχουντσ ϕανε βαλικτ δδσϕηασδδη σδηφγ ασκϕηλκ τεχηνιχαλ αλαν ϕουν διξ τεχηνιχαλ ϕοην µαριανι

Problem Set 3: Solutions

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

Tutorial Note - Week 09 - Solution

Example of the Baum-Welch Algorithm

b. Use the parametrization from (a) to compute the area of S a as S a ds. Be sure to substitute for ds!

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

Matrices and Determinants

PhysicsAndMathsTutor.com

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

CHAPTER-III HYPERBOLIC HSU-STRUCTURE METRIC MANIFOLD. Estelar

ΗΜΥ 210 ΣΧΕΔΙΑΣΜΟΣ ΨΗΦΙΑΚΩΝ ΣΥΣΤΗΜΑΤΩΝ. Χειµερινό Εξάµηνο ΔΙΑΛΕΞΗ 3: Αλγοριθµική Ελαχιστοποίηση (Quine-McCluskey, tabular method)

LESSON 14 (ΜΑΘΗΜΑ ΔΕΚΑΤΕΣΣΕΡΑ) REF : 202/057/34-ADV. 18 February 2014

If ABC is any oblique triangle with sides a, b, and c, the following equations are valid. 2bc. (a) a 2 b 2 c 2 2bc cos A or cos A b2 c 2 a 2.

w o = R 1 p. (1) R = p =. = 1

Edexcel FP3. Hyperbolic Functions. PhysicsAndMathsTutor.com

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ & ΠΛΗΡΟΦΟΡΙΚΗΣ. του Γεράσιμου Τουλιάτου ΑΜ: 697

ΚΥΠΡΙΑΚΟΣ ΣΥΝΔΕΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY 21 ος ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ Δεύτερος Γύρος - 30 Μαρτίου 2011

Αλγόριθμοι και πολυπλοκότητα NP-Completeness (2)

Areas and Lengths in Polar Coordinates

Numerical Analysis FMN011

Laplace s Equation in Spherical Polar Coördinates

ΕΙΣΑΓΩΓΗ ΣΤΟN ΠΡΟΓΡΑΜΜΑΤΙΣΜΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΠΑΤΡΩΝ ΠΟΛΥΤΕΧΝΙΚΗ ΣΧΟΛΗ ΤΜΗΜΑ ΜΗΧΑΝΙΚΩΝ Η/Υ ΚΑΙ ΠΛΗΡΟΦΟΡΙΚΗΣ

Notes on the Open Economy

6. MAXIMUM LIKELIHOOD ESTIMATION

Every set of first-order formulas is equivalent to an independent set

ΠΤΥΧΙΑΚΗ ΕΡΓΑΣΙΑ "ΠΟΛΥΚΡΙΤΗΡΙΑ ΣΥΣΤΗΜΑΤΑ ΛΗΨΗΣ ΑΠΟΦΑΣΕΩΝ. Η ΠΕΡΙΠΤΩΣΗ ΤΗΣ ΕΠΙΛΟΓΗΣ ΑΣΦΑΛΙΣΤΗΡΙΟΥ ΣΥΜΒΟΛΑΙΟΥ ΥΓΕΙΑΣ "

( ) 2 and compare to M.

On a four-dimensional hyperbolic manifold with finite volume

DESIGN OF MACHINERY SOLUTION MANUAL h in h 4 0.

«ΨΥΧΙΚΗ ΥΓΕΙΑ ΚΑΙ ΣΕΞΟΥΑΛΙΚΗ» ΠΑΝΕΥΡΩΠΑΪΚΗ ΕΡΕΥΝΑ ΤΗΣ GAMIAN- EUROPE

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

Physical DB Design. B-Trees Index files can become quite large for large main files Indices on index files are possible.

Συστήματα Διαχείρισης Βάσεων Δεδομένων

Lecture 2. Soundness and completeness of propositional logic

Démographie spatiale/spatial Demography

2. Μηχανικό Μαύρο Κουτί: κύλινδρος με μια μπάλα μέσα σε αυτόν.

ΤΕΧΝΟΛΟΓΙΚΟ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΥΠΡΟΥ ΤΜΗΜΑ ΝΟΣΗΛΕΥΤΙΚΗΣ

ΕΛΛΗΝΙΚΗ ΔΗΜΟΚΡΑΤΙΑ ΠΑΝΕΠΙΣΤΗΜΙΟ ΚΡΗΤΗΣ. Ψηφιακή Οικονομία. Διάλεξη 10η: Basics of Game Theory part 2 Mαρίνα Μπιτσάκη Τμήμα Επιστήμης Υπολογιστών

Transcript:

Πανεπιστήμιο Κρήτης, Τμήμα Επιστήμης Υπολογιστών Άνοιξη 6 HΥ63 - Συστήματα Ανάκτησης Πληροφοριών Infomtion Retievl (IR Systems Web Seching I: Histoy nd Bsic Notions, Cwling II: Link Anlysis Techniques III: Web Spm Pge Identifiction Γιάννης Τζίτζικας ιάλεξη : Ημερομηνία : 9 / 3 / 6 Bsed on Z. Gyongyi, H. Gci-Molin, J. Pedesen, Compting Web Spm with Tust Rnk, SIGMOD CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 Κίνητρο Web spm pges use vious techniques to chieve highe-thndeseved nkings in sech engine s esults. Humn expets cn identify spm, but it is too expensive to mnully evlute lge numbe of pges Ανάγκη για αυτόματες ή ημιαυτόματες τεχνικές διαχωρισμού των «καλών» σελίδων από τις «κακές» CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Web Spm Ορισμός: διασυνδεδεμένες σελίδες δημιουργημένες για παραπλάνηση των μηχανών αναζήτησης Παραδείγματα ponogphy site pge tht contins thousnds of keywods which e mde invisible (to humns by djusting ccodingly the colo scheme sech engine will include this pge in the esults of quey tht contins some of these keywods cetion of lge numbe of bogus web pges, ll pointing to single tget pge (tht pge will hve high in-degee sech engine will nk high this pge CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3 Αντιμετώπιση Κλασσικός τρόπος αντιμετώπισης Sech engine compnies typiclly employ stff membes who specilize in the detection of web spm, constntly scnning the web looking fo offendes In cse spm pge is identified, the sech engine stops cwling nd indexing it. Vey expensive nd slow spm detection pocess CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Μια ημιαυτόματη προσέγγιση Selection of smll set of seed pges to be evluted by n expet Afte the mnul selection of the eputble seed pges, the link stuctue of the web is exploited to discove othe pges tht e likely to be good. Ζητήματα: How we should implement the seed selection? How we cn discove the good pges? CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 5 Appoximte isoltion of the good set 3 5 6 7 8 Empiicl obsevtion: Good pges seldom point to bd ones. CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 6

Appoximte isoltion of the good set Exceptions 3 5 6 7 8 the cetos of good pges cn sometimes by «ticked» nd dd links to bd pges. Exmples: Unmodeted messge bods whee spmmes post messges tht include links to thei spm pges Honey pots pges tht contin some useful esouce but hve hidden links to thei spm pges (the honey pot ttcts people to point to it CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 7 Assessing Tust: Ocle Function We fomlize the notion of humn checking pge by biny «ocle» function O, ove ll pges p in V. O( p if p is bd if p is good Ocle invoctions e expensive nd time consuming. We do not wnt to cll the ocle function fo ll pges. Ou objective is to be selective, i.e. to sk humn expet to evlute only some of the pges CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 8

Συνάρτηση Εμπιστοσύνης (Tust Function To evlute pges without elying on O, we will estimte the likehood tht given pge p is good. Tust function yields vlues between (bd nd (good Idelly, fo ny p, T(p should give us the pobbility tht p is good Idel Tust Popety (ITP T(p P[ O(p] difficult to chieve even if T is not vey ccute we could exploit it to ode pges by thei likehood of being good CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 9 Συνάρτηση Εμπιστοσύνης (Tust Function Desied Tust Popety (elxtion of ITP T(p < T(q P[ O(p] < P[ O(q] T(p T(q P[ O(p] P[ O(q] Theshold Tust Popety (nothe elxtion of ITP T(p > δ O(q CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Υπολογισμός Εμπιστοσύνης: The ignont tust function The ignont tust function T We cn select t ndom seed set S of L pges nd cll the ocle on its elements. Let S+ be the good pges nd S- the bd ones. Since the emining pges e not checked we cn mk them with /. This is the ignont tust function T T ( p O( p / if p S othewise CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 Διάδοση Εμπιστοσύνης (Tust Popgtion We cn exploit the empiicl obsevtion «Good pges seldom point to bd ones», nd ssign scoe to ll pges tht e echble fom pge in S+ in M o fewe steps. Tust Function T M : T M O( p ( p / if if p S p S nd q S othewise + : M q p q-m->p : thee is pth of mximum length M fom q to p The bigge M the futhe we e fom good pges, the less cetin we e tht pge is good CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Εξασθένηση Εμπιστοσύνης (Tust Attenution Tust dmpening β β β β 3 β T(β T( T(3β Tust dmpening ssign scoe β (< to pges echble t step ssign the scoe β*β to pges echble t step, nd so on pges with multiple inlinks: mximum scoe o vege scoe CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3 Εξασθένηση Εμπιστοσύνης (Tust Attenution Tust splitting Tust splitting T( T( / / /3 /3 /3 motivtion: the ce with which people dd links to thei pges in often invesely popotionl to the numbe of links on the pge if pge p hs tust scoe T(p nd it points to out(p pges, ech of them will eceive scoe fction T(p/ out(p fom p the ctul scoe of pge will be the sum of the scoe fctions eceived though its inlinks We could combine tust dmpening nd splitting 3 T(35/6 5/ 5/ CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 5 Ο Αλγόριθμος TustRnk In TustRnk we will combine tust dmpening nd splitting: in ech itetion, the tust scoe of node is split mong its neighbos nd dmpened by fcto of b We will compute TustRnk scoes using bised PgeRnk lgoithm the ocle-povided scoes eplce the unifom distibution CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 6 Επανάληψη: PgeRnk M p q if q out M p q if q p T, ( ( /, (, ( Tnsition mtix T 3 / / T M Adjcency mtix M The PgeRnk scoe R(p of pge is defined s + ( ( ( ( ( p in q N q out q R p R The equivlent mtix eqution: N N R T R + (

CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 7 Επανάληψη: PgeRnk 3 + ( 3 / / 3 + + ( 3/ 3/ 3 + + + + / ( 3/ / ( / ( 3/ ( / ( 3 N N R T R + ( CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 8 Επανάληψη: Ο Αλγόριθμος PgeRnk function PgeRnk Input T: tnsition mtix, N: numbe of pges, b : decy fcto fo bised PgeRnk, M b : numbe of bised PgeRnk itetions output t* : PgeRnk scoes (3 d /Ν * N // initil scoe fo ll pges is /Ν (5 t* d fo i to M b do // evlutes PgeRnk scoes t* b T t* + ( - b d etun t*

Ο Αλγόριθμος TustRnk function TustRnk Input T: tnsition mtix, N: numbe of pges, L: limit of ocle invoctions, b : decy fcto fo bised PgeRnk, M b : numbe of bised PgeRnk itetions output t* : TustRnk scoes ( s SelectSeed ( // seed-desibility: etuns vecto. // E.g. s(p is the desibility fo pge p ( σ Rnk({,,N}, s // odes in decesing ode of s-vlue ll pges (3 d N // initil scoe fo ll pges is fo i to L do // invokes ocle function on the most desible pges if O(σ(i then d(σ(i ( d : d / d // nomlize sttic distibution scoe (to sum up to (5 t* d fo i to M b do // evlutes TustRnk scoes using bised PgeRnk t* b T t* + ( - b d // note tht d eplces the unifom distibution etun t* CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 9 Ο Αλγόριθμος TustRnk Remks: function TustRnk Input T: tnsition mtix, N: numbe of pges, L: limit of ocle invoctions, Step 5 implements pticul vesion of tust dmpening nd splitting: in b : decy fcto fo bised PgeRnk, M b : numbe of bised PgeRnk itetions ech itetion, the tust scoe of node is split mong its neighbos nd output t* : TustRnk scoes dmpened by fcto of b ( s SelectSeed ( // seed-desibility: etuns vecto. // E.g. s(p is the desibility fo pge p The good seed pges hve no longe scoe of, howeve they still hve ( σ Rnk({,,N}, s // odes in decesing ode of s-vlue ll pges the highest scoes (3 d N // initil scoe fo ll pges is fo i to L do // invokes ocle function on the most desible pges if O(σ(i then d(σ(i ( d : d / d // nomlize sttic distibution scoe (to sum up to (5 t* d fo i to M b do // evlutes TustRnk scoes using bised PgeRnk t* b T t* + ( - b d // note tht d eplces the unifom distibution etun t* CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Selecting Seeds ( s SelectSeed ( // seed-desibility: etuns vecto. // E.g. s(p is the desibility fo pge p ( σ Rnk({,,N}, s // odes in decesing ode of s-vlue ll pges Πιθανές Στρατηγικές α Rndom selection β High PgeRnk Επιλέγουμε τις σελίδες με υψηλό PgeRnk σκορ διότι αυτές οι σελίδες συχνά εμφανίζονται στην κορυφή των απαντήσεων γ Invese PgeRnk 3 5 Although 5 hs the highest PgeRnk it is not good seed CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 Selecting Seeds: (γ Invese PgeRnk επειδή η εμπιστοσύνη διαχέεται από τις καλές σελίδες, είναι λογικό να επιλέξουμε εκείνες τις σελίδες από τις οποίες μπορούμε να φτάσουμε σε πολλές άλλες άρα μια ιδέα είναι να επιλέξουμε τις σελίδες με πολλά outlinks Επιλογή των p, p, p5 3 5 6 γενίκευση: επιλέγουμε τις σελίδες που δείχνουν σε πολλές σελίδες οι οποίες με τη σειρά τους δείχνουν σε πολλές σελίδες, κ.ο.κ Επιλογή της p 7 8 Τρόπος: Αφού η σπουδαιότητα μιας σελίδας εξαρτάται από τα outlinks της (και όχι από τα inlinks της, μπορούμε να χρησιμοποιήσουμε την PgeRnk αντιστρέφοντας την φορά των ακμών 3 5 6 7 8 CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Expeimentl Evlution Expeimentl Evlution Expeiments on the complete set of pges cwled nd indexed by AltVist (Aug. 3 Reduce computtionl cost: wok t the level of web sites (insted of web pges gouping of the (billions of pges into 3 millions sites websitea points to websiteb if one o moe pges fom websitea point to one o moe pges of websiteb So t most link my stt fom website A nd point to website B Obsevtions /3 of the websites e unefeenced So TustRnk cnnot diffeentite between them becuse they ll hve in(p Howeve they e low scoed nywy (e.g. by PgeRnk so they do not ppe high in nswes CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6

Expeimentl Evlution: Seed Selection ( s SelectSeed ( ( σ Rnk({,,N}, s (3 d N ( fo i to L do if O(σ(i then d(σ(i Seed Set Selection Invese PgeRnk pplied on the gph of websites woked bette thn High PgeRnk (fo the seed selection pocess Pmetes: :.85, itetions: With itetions the eltive odeing stbilized Mnul inspection of the top 5 sites ( S 5 Fom these only 78 wee used s good seeds, i.e. S+ 78 sites CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 5 Expeimentl Evlution: Evlution Smple To test the effectiveness of TustRnk we need Refeence Collection (e.g. something like TREC A smple set X of sites ws selected nd evluted mnully, i.e. the ocle function ws invoked (i.e. peson inspected them nd decided whethe they e spm o not The Smple set X ws not selected t ndom. Recll tht we e minly inteetested in spm pges tht ppe high in nswes The following smple selection method ws followed: Genete list of sites in decesing ode of thei PgeRnk scoes Segment them into buckets so tht the sum of the scoes in ech bcket equls 5% of the totl PgeRnk scoe bcket 86, bcket 665,, bcket 5 millions pges select 5 sites t ndom fom ech bucket ( * 5 CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 6

Expeimentl Evlution: Evlution Smple The esults of the mnul evlution (ocle invoction of the pges in the smple set of sites CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 7 This collection (i.e. X ws used fo evluting TustRnk vesus PgeRnk

Evlution Results: Comping PgeRnk with TustRnk Good sites Reputblewhite, dvetisementgy, webogniztiondk gy CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 9 Evlution Results: Comping PgeRnk with TustRnk Bd sites TustRnk is esonble spm detection tool CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3

Μέτρα Αξιολόγησης της Συνάρτησης Εμπιστοσύνης (Evlution Metics fo the Tust Function Assume smple set X of web pges fo which we cn invoke both T nd O Web X CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3 Μέτρα Αξιολόγησης της Συνάρτησης Εμπιστοσύνης: Pecision nd Recll We cn define pecision nd ecll bsed on the theshold tust popety: pec( T, O { p X T ( p > δ nd O( p { q X T ( q > δ } } X ec( T, O { p X T ( p > δ nd O( p { q X O( q } } CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3

Μέτρα Αξιολόγησης της Συνάρτησης Εμπιστοσύνης: Pecision & Recll: Πειραματική Αξιολόγηση δ: such tht to septe bckets CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 33 Μέτρα Αξιολόγησης της Συνάρτησης Εμπιστοσύνης: Piwise Odeedness We cn genete fom X set P of pis nd we cn compute the fction of the pis fo which T did not mke mistke. CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 3 X Web The following metic cn signl violtion of the odeed tust popety if T ( p T ( q nd O( p < O( q I( T, O, p, q if T ( p T ( q nd O( p > O( q othewise P ( p, q P I( T, O, p, q piod( T, O, P P Piod(T,O,P if T does not mke ny mistke Piod(T,O,P if T mkes lwys mistkes P X x X

Μέτρα Αξιολόγησης της Συνάρτησης Εμπιστοσύνης: Piwise Odeedness: Πειραματική Αξιολόγηση CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 35 Συμπέρασμα Πειραματικής Αξιολόγησης TustRnk cn effectively filte out spm fom significnt fction of the Web, bsed on good seed set of less thn sites CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 36

Refeences Z. Gyongyi, H. Gci-Molin, J. Pedesen, Compting Web Spm with Tust Rnk, SIGMOD CS63 - Infomtion Retievl Systems Ynnis Tzitziks, U. of Cete, Sping 6 37