GRAPH PARTITIONING and SPECTRAL CLUSTERING

GRAPH PARTITIONING and SPECTRAL CLUSTERING Εργαςτιριο Ηλεκτρονικισ (ELLAB) Τμιμα Φυςικισ, Παν. Πατρϊν Θεοχαράτοσ Χριςτοσ, htheohar@upatras.gr

Spectral Clustering? 2 Αλγόρικμοι που ομαδοποιοφν ςθμεία χρθςιμοποιϊντασ ιδιο-διανφςματα (eigenvectors) πινάκων που προζρχονται από τα δεδομζνα. Πραγματοποιείται με χριςθ γράφων (graphs) και υπολογιςμοφ του πίνακα γειτνίαςθσ (adjacency matrix) ςθμείων. Ο οριςμόσ ενόσ γράφου ιςοδυναμεί με τον κακοριςμό τθσ ςχζςθσ γειτνίαςθσ που το περιγράφει, θ οποία μπορεί να κωδικοποιθκεί υπό τθ μορφι πίνακα. Η φαςματικι προςζγγιςθ τθσ κεωρίασ γράφων, γνωςτι ωσ φαςματικι κεωρία γράφων(spectral graph theory), είναι θ μελζτθ τθσ ςχζςθσ που ςυνδζει ζνα γράφο με τισ ιδιοτιμζσ πινάκων ςυςχετιηόμενων με το γράφο. Παίρνουμε αναπαράςταςθ των δεδομζνων ςε ζνα χϊρο χαμθλισ διάςταςθσ (low-dimensional space), ο οποίοσ μπορεί να ομαδοποιθκεί. Υπάρχει ποικιλία μεκόδων που χρθςιμοποιεί με διαφορετικό τρόπο τα ιδιοδιανφςματα.

Περιεχόμενα 3 Στοιχεία τθσ Θεωρίασ Γράφων Ομοιότθτα δεδομζνων με χριςθ γράφων Overview of Graph Partitioning How do we define/find a good graph partition? Spectral Graph Theory Matrix representation Theoretical background Spectral Clustering Algorithms Bi-partitioning K-way partitioning Co-clustering Conclusion

Βαςικά ςτοιχεία Γράφων 4 Ζνασ γράφοσ (graph) G=(V,E) αποτελείται από ζνα ςφνολο κόμβων (nodes) V={V i } i=1:n και ζνα ςφνολο από ενϊςεισ (links) E={E ij } i j που ονομάηονται ακμζσ (edges). 6 4 5 7 V 1,2,3,4,5,7 Γιπλή Ακμή Απομονωμένος κόμβος 1 2 3 E 1,2, 1,5, 2,5, 3,4, 5,7 Βρόχος Πολλαπλές Ακμές Συνικωσ, από τθ γεωμετρικι αναπαράςταςθ ενόσ γράφου: εξαιροφνται οι απομονωμζνοι κόμβοι (isolated nodes) δεν ςυμπεριλαμβάνονται οι βρόχοι (loops) κατά τουσ οποίουσ μια ακμι τελειϊνει από εκεί από όπου ξεκίνθςε, και οι παράλλθλεσ ι πολλαπλζσ ακμζσ.

Βαςικά ςτοιχεία Γράφων 5 Συνεπϊσ, κάκε ακμι ενϊνει δφο κόμβουσ τουσ οποίουσ καλοφμε άκρα (endpoints). 6 4 5 7 1 2 Αν ορίςουμε τουσ κόμβουσ με αρικμοφσ 1,2,3,,m οι ακμζσ μποροφν να οριςκοφν από γράμματα e 1, e2,, e m ι να αναπαραςτακοφν ωσ διατεταγμζνα ηεφγθ των άκρων τουσ (π.χ. (1,2), (1,5)). Ο βακμόσ (degree) ενόσ κόμβου V i ιςοφται με τον αρικμό των ακμϊν που εμπίπτουν ςε αυτόν. Δφο κόμβοι που είναι ενωμζνοι με τθν ίδια ακμι, ονομάηονται γειτονικοί ι ςυνοριακοί (adjacent) κόμβοι. 3

Ανοιγμζνα Δζντρα (Spanning Trees) 6 4 1 6 7 5 2 Κύκλος 3 Αν δεν υπάρχει κάποιοσ περιοριςμόσ, θ οποιαδιποτε οδόσ διαμζςου του γράφου ονομάηεται διαδρομι (walk). Παράδειγμα, θ μετακίνθςθ 3 7 2 7 είναι απλά μία διαδρομι. Αν θ κάκε ακμι μπορεί να ςυναντθκεί μόνο μια φορά, θ διαδρομι διαμζςου του γράφου ονομάηεται ίχνοσ (trail). Παράδειγμα, θ μετακίνθςθ 5 6 4 5 2 είναι ζνα ίχνοσ. Ζνα μονοπάτι (path) είναι μια διαδρομι που δεν περιζχει επαναλαμβανόμενουσ κόμβουσ. Προφανϊσ, κάκε μονοπάτι είναι αυτομάτωσ και ζνα ίχνοσ. Παράδειγμα, ζνα τυπικό μονοπάτι αποτελεί θ μετακίνθςθ 3 7 2 5 6 4. Θα ονομάηουμε κλειςτό κάποιο μονοπάτι, ίχνοσ ι διαδρομι, αν καταλιγει ςτον κόμβο εκείνο από τον οποίο ξεκίνθςε. Αν ζνα μονοπάτι κεωροφμε ότι καταλιγει ςτο ςθμείο εκείνο που ξεκίνθςε, τότε ονομάηεται κφκλοσ (cycle). Παράδειγμα, ο κλειςτόσ βρόχοσ 5 6 4 5 αποτελεί ζναν κφκλο.

Ανοιγμζνα Δζντρα (Spanning Trees) 7 Επιπλζον, ςυνδεμζνοσ γράφοσ (connected graph), ονομάηεται ο γράφοσ ςτον οποίο κάκε ηεφγοσ κόμβων ζχει κάποια διαδρομι ανάμεςά τουσ. Όταν ςε κάκε ςυνδεμζνο γράφο ζχει αποδοκεί μια τιμι ωσ απόςταςθ για κάκε ακμι, τότε λζμε ότι ο γράφοσ είναι βεβαρθμζνοσ (weighted graph). Το μικοσ μια διαδρομισ ςε ζνα τζτοιο γράφο κα είναι τότε το άκροιςμα των αποςτάςεων όλων των ακμϊν που ςυμμετζχουν. 4 1 6 7 5 2 Κύκλος 3 Είναι προφανζσ ότι το ςυντομότερο μονοπάτι (shortest path) ελαχιςτοποιεί τθν ποςότθτα l, όπου i 1 : n 1 και j 1 : n. ij Τζλοσ, ζνασ γράφοσ G V', E' ονομάηεται υπογράφοσ (subgraph) του G V, E, αν V' V και E' E.

8 Ανοιγμζνο Δζντρο Ελαχίςτου Μικουσ (Minimal Spanning Tree, MST) Ζνα δζντρο (tree) είναι ζνα ειδικό είδοσ γράφου, το οποίο χαρακτθρίηεται από ζλλειψθ κφκλων και ςυνεκτικότθτα, όταν δθλαδι υπάρχουν ακμζσ οι οποίεσ ενϊνουν όλουσ τουσ κόμβουσ του γράφου. Ζνα ανοιγμζνο δζντρο (spanning tree) ενόσ βεβαρθμζνου γράφου, είναι ζνασ ςυνδεμζνοσ υπογράφοσ του G V, E που περιζχει όλουσ τουσ κόμβουσ του γράφου και κανζνα κφκλο. Αν αρχικά υπάρχουν Ν κόμβοι τότε ζνα ανοιγμζνο δζντρο περιζχει ακριβϊσ Ν-1 ακμζσ. Spanning Tree (μπλε) MST (κόκκινο) για Ν=10 κόμβουσ τθσ ίδιασ τοπολογίασ Το ανοιγμζνο δζντρο ελαχίςτου μικουσ (minimal spanning tree, MST), είναι ζνα ανοιγμζνο δζντρο το οποίο περιζχει επίςθσ Ν-1 ακμζσ, για τισ οποίεσ το άκροιςμα των αποςτάςεων τουσ είναι ελάχιςτο. Δθλαδι, αν κάκε ακμι ( i, j) ζχει μικοσ l ij, το MST είναι θ δενδρικι δομι εκείνθ του γράφου για τθν οποία ελαχιςτοποιείται θ ποςότθτα. l ij

Καταςκευι του MST 9 Αλγόριθμος Prim O V Αρχίηουμε με το node 1, ςτο set V (μπλε) Προςκζτουμε το node 6 ςτο V και το link 1 6 ςτο Ε Προςκζτουμε το node 3 ςτο V και το link 6 3 ςτο Ε Ομοίωσ τθ διαδικαςία 2 5 6 5 1 6 5 5 3 4 2 6 4 3 6 1 2

10 Ομοιότθτα δεδομζνων με γράφουσ Το MST «υποδεικνφει» τθ δομι των δεδομζνων Σφγκριςθ μζςω MST οδθγεί ςτθν ςφγκριςθ αντίςτοιχων κατανομϊν

11 Το Πολυδιάςτατο Wald-Wolfowitz test (WW-test) Δίνονται 2 ςφνολα τιμϊν {X i } i=1:m F x και {Y i } i=1:n F y, θ υπόκεςθ H o που πρζπει να ελεγχκεί είναι αν F x =F y. 1. Αρχικά, καταςκευάηεται το ολικό MST. 2. Διαγράφονται οι ακμζσ για τισ οποίεσ οι ςυνδεμζνοι κόμβοι προζρχονται από διαφορετικά δείγματα. 3. Υπολογίηεται θ ποςότθτα R, ωσ τα τελικά μθ- ενωμζνα υποδζντρα. R = 11 4. Απόρριψθ τθσ Ho για μικρζσ τιμζσ του R.

12 Το Πολυδιάςτατο Wald-Wolfowitz test (WW-test) Θεωρούμε δείγματα μεγέθους m και n αντίστοιχα, από κατανομές F x και F y και έστω N=m+n. C ο αριθμός του ζεύγους-ακμών που μοιράζονται ένα κοινό κόμβο. Υπό την υπόθεση H o, υπολογίζουμε τα E[R] και Var[R C] του R. E R 2mn N 2mn 2mn N C N 2 Var R C N N 1 4mn N N 1 N N 2 N 3 Η ποσότητα W προσεγγίζει την κανονική κατανομή: 1 2 W R E R Var R

13 Παράδειγμα: Τυχαία δειγματολθψία ςθμείων R=14, W=-0.7502 R=4, W=-4.4891

14 Graph partitioning and clustering Directed, complete, bipartite graphs Adjacency matrix Similarity graph Graph partitioning Clustering objectives

15 Επιςτροφή directed graph Αν G V, E είναι ζνασ κατευκυντικόσ γράφοσ (directed graph ι digraph), κάκε ακμι είναι ζνα διατεταγμζνο ηεφγοσ των κόμβων. Διαφζρει από ζνα ςυνθκιςμζνο γράφο (μθ-κατευκυντικό), αφοφ εδϊ πρόκειται για ζνα διατεταγμζνο ηεφγοσ κόμβων (edges). Επίςθσ, δεν επιτρζπονται οι βρόχοι (loops).

16 Επιςτροφή complete graph Ζνασ πλιρθσ γράφοσ (complete graph) είναι ζνα απλόσ μθ-κατευκυντικόσ γράφοσ του οποίου κάκε ηεφγοσ κόμβων ενϊνεται με μια μοναδικι ακμι. Ζνασ πλιρθσ γράφοσ με N κόμβουσ ζχει Ν(Ν-1)/2 ακμζσ (τιμζσ τριγωνικοφ πίνακα). Ζχει βακμό (degree) Ν-1. Ο ςυμπλθρωματικόσ γράφοσ ενόσ πλιρθ γράφου είναι ζνασ κενόσ γράφοσ (empty graph).

17 Επιςτροφή bipartite graph (1) Ζνασ διμερισ γράφοσ (bipartite graph ι bigraph) είναι ο γράφοσ του οποίου οι κόμβοι μποροφν να χωριςτοφν ςε δφο διαφορετικζσ ομάδεσ (ςφνολα) U και V, ζτςι ϊςτε κάκε ακμι να ςυνδζει ζναν κόμβο του U με ζναν του V. Δθλαδι, τα U και V είναι ανεξάρτθτα ςφνολα. Οι δφο ομάδεσ U και V μποροφν να κεωρθκοφν ωσ ζνασ χρωματιςμόσ του γραφιματοσ με δφο χρϊματα. Γράφουμε G=(U, V, E) για ζναν bipartite graph. Αν ζνασ bipartite graph είναι ςυνδεμζνοσ, ο διαχωριςμόσ του μπορεί να κακοριςκεί από τθν ιςοτιμία (parity) των αποςτάςεων από κάποιο τυχαία επιλεγμζνο κόμβο V. Μποροφμε να δοφμε αν ζνασ γράφοσ είναι bipartite χρθςιμοποιϊντασ τθν τεχνικι parity για να ανακζςουμε κόμβουσ ςε κάκε υποςφνολο U και V, χωριςτά ςε κάκε ενωμζνο ςτοιχείο του γράφου, και μετά να εξετάςουμε κάκε ακμι για να επαλθκεφςουμε ότι ζχει άκρα που ανικουν ςε διαφορετικά υποςφνολα.

Επιςτροφή bipartite graph (2) 18 Εφαρμογζσ bipartite graph: Χρθςιμοποιοφνται για τθ μοντελοποίθςθ προβλθμάτων ςφγκριςθσ. Παράδειγμα: πρόβλθμα ςφγκριςθσ εργαςιϊν. Ζςτω ότι ζχουμε ζνα ςφνολο P ανκρϊπων και ζνα ςφνολο J εργαςιϊν, αλλά δεν είναι όλοι κατάλλθλοι για όλεσ τισ δουλειζσ. Μποροφμε να μοντελοποιιςουμε το πρόβλθμα ωσ ζνα bipartite graph (P, J, E). Αν ζνασ άνκρωποσ Px είναι κατάλλθλοσ για μια εργαςία Jv, υπάρχει μια ακμι μεταξφ Px και Jv ςτον γράφο! Πίνακασ γειτνίαςθσ (adjacency matrix) ενόσ (bipartite) graph:

19 Επιςτροφή k-nearest neighbor graph Ζςτω δεδομζνα και pairwise similarities s ij Σφνδεςθ του κάκε ςθμείου με τουσ k nearest neighbors Weight the edges by the similarity score Ο k-nn γράφοσ ςυνδζει τα A και B αν Α Β ι Α Β.

20 Επιςτροφή ε-neighborhood graph Ζςτω δεδομζνα και pairwise distances d ij Σφνδεςθ κάκε ςθμείου με όλα τα υπόλοιπα που ζχουν απόςταςθ d ij μικρότερθ από ζνα threshold ε. Είτε χριςθ unweighted graph ι ςυμπλθρωματικά μετατροπι των distances ςε similarities και χριςθ των similarities ωσ weights.

Similarity graph (Ομοιότθτα γράφου) 21 V={x i } E={W ij } Σφνολο n κόμβων που αντιπροςωπεφουν δεδομζνα (ςθμεία) Σφνολο βεβαρθμζνων ακμϊν που κακορίηουν pair-wise ομοιότθτα μεταξφ ςθμείων 1 0.1 5 2 3 0.6 0.2 4 0.7 6 Μειϊνεται θ απόςταςθ Αυξάνεται θ ομοιότθτα Wij αντιπροςωπεφει ομοιότθτα μεταξφ κόμβων Wij=0 όπου δεν υπάρχει ομοιότθτα, Wii=0

Graph partitioning (Κατάτμθςθ γράφου) 22 Η ομαδοποίθςθ (Clustering) μπορεί να δειχκεί ωσ κατάτμθςθ ενόσ γράφου ομοιότθτασ (partitioning a similarity graph). Bi-partitioning task: Divide vertices into two disjoint groups (A,B) A 1 5 B 2 4 6 3 V=A U B Graph partition is NP hard

Good vs. Bad Clustering! 23 Where to cut??? Clusters are not always about Euclidean distance, and parametric models are not always appropriate

Clustering objectives 24 Traditional definition of a good clustering: 1. Points assigned to same cluster should be highly similar. 2. Points assigned to different clusters should be highly dissimilar. Apply these objectives to our graph representation 1 0.1 5 2 3 0.6 0.2 4 0.7 6 Minimize weight of between-group connections

25 Spectral clustering Graph cuts Spectral graph theory Spectral clustering

Spectral clustering (in one slide!) 26 Δίνονται: data points X 1,, X n, pairwise similarities w ij = s(x i,x j ) Καταςκευι similarity graph: nodes = data points, edges = similarities Clustering = find a cut through the graph Define a cut objective function Solve it ~ Spectral Clustering

Graph Cuts 27 Express partitioning objectives as a function of the edge cut of the partition. Cut: Set of edges with a node in a group We want to find the minimal cut between groups The groups that has the minimal cut would be the partition A 1 0.1 B 5 cut ( A, B) i A, j w ij B 2 3 0.6 0.2 4 0.7 6 cut(a,b) = 0.3

Graph Cut Criteria (1) 28 Criterion: Minimum-cut Minimise weight of connections between groups Degenerate case: min cut(a,b) Optimal cut Minimum cut Problem: Only considers external cluster connections Does not consider internal cluster density

Graph Cut Criteria (2) 29 Criterion: Normalized-cut (Shi & Malik, 97) Consider the connectivity between groups relative to the density of each group. cut( A,B ) cut( A,B ) min Ncut( A,B ) vol( A) vol( B ) Normalize the association between groups by volume. vol(a): The total weight of the edges originating from group A. Why use this criterion? Minimizing the normalized cut is equivalent to maximizing normalized association. Produces more balanced partitions.

Second Option 30 The previous criteria was on the weight This following criteria is on the size of the group A B N A B number _ of _ vertexes _ on _ A number _ of _ vertexes _ on _ B A B N 0 A B thats 1 A or B 0 1 N

Example: 2 Spirals 31 2 1,5 1 0,5 0-2 -1,5-1 -0,5 0 0,5 1 1,5 2-0,5 Dataset exhibits complex cluster shapes K-means performs very poorly in this space due bias toward dense spherical clusters. -1-1,5-2 0.6 0.4 0.2 In the embedded space given by two leading eigenvectors, clusters are trivial to separate. 0-0.709-0.7085-0.708-0.7075-0.707-0.7065-0.706-0.2-0.4-0.6 -

Spectral Graph Theory 32 Possible approach: Represent a similarity graph as a matrix Apply knowledge from Linear Algebra The eigenvalues and eigenvectors of a matrix provide global information about its structure. Spectral Graph Theory Analyze the spectrum of matrix representing a graph. w w x x 11 1n 1 1 w w x x n1 nn n n Spectrum : The eigenvectors of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues. { 1, 2,..., n } λ

Matrix Representations (1) 33 Adjacency matrix (A) πίνακασ γειτνίαςησ n x n matrix A [ w ij ] : edge weight between node x i and x j 2 1 3 0.6 0.1 0.2 Important properties: Symmetric matrix Eigenvalues are real 0.7 Eigenvector could span orthogonal base 4 5 6 x 1 x 2 x 3 x 4 x 5 x 6 x 1 0 0.6 0 0.1 0 x 2 0 0 0 0 x 3 0.6 0 0.2 0 0 x 4 0 0 0.2 0 0.7 x 5 0.1 0 0 0 x 6 0 0 0 0.7 0

Matrix Representations (2) 34 2 Degree matrix (D) πίνακασ βαθμών n x n diagonal matrix D ( i, i) 1 3 0.6 j w ij 0.1 0.2 Important application: Normalise adjacency matrix : total weight of edges incident to vertex xi 5 6 4 x 1 x 2 x 3 x 4 x 5 x 6 x 1 1.5 0 0 0 0 0 x 2 0 1.6 0 0 0 0 0.7 x 3 0 0 1.6 0 0 0 x 4 0 0 0 1.7 0 0 x 5 0 0 0 0 1.7 0 x 6 0 0 0 0 0 1.5

Matrix Representations (3) 35 Laplacian matrix (L) Λαπλαςιανόσ πίνακασ n x n symmetric matrix L = D - A x 1 x 2 x 3 x 4 x 5 x 6 2 1 3 0.6 0.1 0.2 4 5 0.7 6 x 1 1.5 - -0.6 0-0.1 0 x 2-1.6-0 0 0 x 3-0.6-1.6-0.2 0 0 x 4 0 0-0.2 1.7 - -0.7 Important properties: Eigenvalues are non-negative real numbers Eigenvectors are real and orthogonal x 5-0.1 0 0-1.7 - x 6 0 0 0-0.7-1.5 Eigenvalues and eigenvectors provide an insight into the connectivity of the graph!

Another option Normalized Laplacian 36 Normalized Laplacian matrix (L) n x n symmetric matrix D -0.5 (D A) D -0.5 2 1 3 0.6 0.1 0.2 4 5 0.7 6 - - - - - - -0.12-0.12 - - - Important properties: Eigenvectors are real and normalized - - - - - Each Aij (which i,j is not equal) = A ij Dii

Ιδιότθτεσ Λαπλαςιανοφ πίνακα (1) 37 Ο L είναι πραγματικόσ ςυμμετρικόσ πίνακασ, για τον λόγο αυτό οι ιδιοτιμζσ του είναι πραγματικζσ και τα ιδιοδιανφςματά του ορκογϊνια. O L είναι κετικά θμιοριςμζνοσ πίνακασ, για τον λόγο αυτό όλεσ οι ιδιοτιμζσ του είναι μθ-αρνθτικζσ. Η δεφτερθ μικρότερθ ιδιοτιμι του Λαπλαςιανοφ πίνακα ενόσ γραφιματοσ ονομάηεται algebraic connectivity. Η ιδιοτιμι αυτι είναι μεγαλφτερθ του μθδενόσ αν και μόνο αν το είναι ςυνεκτικό. Το ιδιοδιάνυςμα που αντιςτοιχεί ςτθν αλγεβρικι ςυνδεςιμότθτα ονομάηεται διάνυςμα Fiedler (Fiedler Vector) και χρθςιμοποιείται κατά τθ φαςματικι διαίρεςθ γραφθμάτων. Εάν το G είναι ςυνεκτικό, ο πίνακασ L ζχει βακμό n-1, όπου είναι το πλικοσ των κόμβων του γραφιματοσ.

Ιδιότθτεσ Λαπλαςιανοφ πίνακα (2) 38

Find An Optimal Min-Cut (Hall 70, 39 Fiedler 73) Express a bi-partition (A,B) as a vector p i 1 if x A 1 if x B The Laplacian is semi positive The Rayleigh Theorem shows: i i The minimum value for f(p) is given by the 2nd smallest eigenvalue of the Laplacian L. The optimal solution for p is given by the corresponding eigenvector λ 2, referred as the Fiedler Vector. p T L p Laplacian matrix We can minimise the cut of the partition by finding a non-trivial vector p that minimises the function

Spectral clustering Algorithms 40 Three basic stages: 1. Pre-processing Construct a matrix representation of the dataset. 2. Decomposition Compute eigenvalues and eigenvectors of the matrix. Map each point to a lower-dimensional representation based on one or more eigenvectors. 3. Grouping Assign points to two or more clusters, based on the new representation.

Spectral Bi-partitioning Algorithm (1) 41 1. Pre-processing Build Laplacian matrix L of the graph 2. Decomposition Find eigenvalues X and eigenvectors Λ of the matrix L Map vertices to corresponding components of λ 2 x 1 0.2 x 2 x 3 0.2 0.2 x 4-0.4 x 5-0.7 x 6-0.7 Λ = 0.0 0.4 2.2 2.3 2.5 3.0 x 1 x 2 x 3 x 4 x 5 x 6 x 1 1.5 - -0.6 0-0.1 0 x 2-1.6-0 0 0 x 3-0.6-1.6-0.2 0 0 x 4 0 0-0.2 1.7 - -0.7 x 5-0.1 0 0-1.7 - x 6 0 0 0-0.7-1.5 X = 0.4 0.4 0.4 0.4 0.4 0.4 0.2 0.2 0.2-0.4-0.7-0.7 0.1 0.1-0.2 0.9-0.4-0.2 0.4-0. 0.0 0.2-0.5-0.2 0.4-0.2-0.4-0.6-0.9 0.3 0.6-0.6-0.2 0.9

Spectral Bi-partitioning Algorithm (2) 42 The matrix which represents the eigenvector of the laplacian matched to the corresponded eigenvalues with increasing order - - - - - - - - - 6 5 4 3 2 1 - - - - -

Spectral Bi-partitioning Algorithm (3) 43 3. Grouping Sort components of reduced 1- dimensional vector. Identify clusters by splitting the sorted vector in two. x 1 x 2 x 3 x 4 x 5 x 6 0.2 0.2 0.2-0.4-0.7-0.7 Split at 0 Cluster A: Positive points Cluster B: Negative points How to choose a splitting point? Naïve approaches: x 1 x 2 x 3 0.2 0.2 0.2 x 4 x 5 x 6-0.4-0.7-0.7 Split at 0, mean or median value More expensive approaches A B Attempt to minimize normalized cut criterion in 1-dimension

Example: 3 clusters 44 Lets assume the next data points

Example: 3 clusters 45 If we use the 2 nd eigenvector If we use the 3 rd eigenvector

Παραδείγματα (1) 46 Shi & Malik Normalized cuts

Παραδείγματα (2) 47 Ng, Jordan & Weiss

Παραδείγματα (3) 48

Παραδείγματα (4) 49

K-Way Spectral Clustering 50 How do we partition a graph into k clusters? Two basic approaches: 1. Recursive bi-partitioning (Hagen et al., 91) Recursively apply bi-partitioning algorithm in a hierarchical divisive manner. Disadvantages: Inefficient, unstable 2. Cluster multiple eigenvectors (Shi & Malik, 00) Build a reduced space from multiple eigenvectors. Commonly used in recent papers A preferable approach but its like to do PCA and then k-means

Recursive bi-partitioning (Hagen et al., 91) 51 Partition using only one eigenvector at a time Use procedure recursively Example: Image Segmentation Uses 2nd (smallest) eigenvector to define optimal cut Recursively generates two clusters with each cut

Why use Multiple Eigenvectors? 52 1. Approximates the optimal cut (Shi & Malik, 00) Can be used to approximate the optimal k-way normalised cut. 2. Emphasizes cohesive clusters (Brand & Huang, 02) Increases the unevenness in the distribution of the data. Associations between similar points are amplified, associations between dissimilar points are attenuated. The data begins to approximate a clustering. 3. Well-separated space Transforms data to a new embedded space, consisting of k orthogonal basis vectors. NB: Multiple eigenvectors prevent instability due to information loss.

K-Eigenvector Clustering 53 K-eigenvector Algorithm (Ng et al., 01) 1. Pre-processing Construct the scaled adjacency matrix 2. Decomposition Find the eigenvalues and eigenvectors of A'. Build embedded space from the eigenvectors corresponding to the k largest eigenvalues. 3. Grouping 1 / 2 1/ 2 Apply k-means to reduced n x k space to produce k clusters. A' D AD

Eigenvalue How to select k? 54 Eigengap: the difference between two consecutive eigenvalues. Most stable clustering is generally given by the value k that maximises the expression k k k 1 50 max k 2 1 45 40 35 30 λ 1 Choose k=2 25 20 15 λ 2 10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 K

Conclusion 55 Clustering as a graph partitioning problem Quality of a partition can be determined using graph cut criteria. Identifying an optimal partition is NP-hard. Spectral clustering techniques Efficient approach to calculate near-optimal bi-partitions and k-way partitions. Based on well-known cut criteria and strong theoretical background.

56 Ευχαριςτϊ! Χρήστος Θεοχαράτος htheohar@upatras.gr