Dr. Demetrios D. Diamantidis Assistant Professor, Section of Telecommunications and Space Sciences, Department of Electrical and Computer Engineering, Polytechnic School, Demokritos University of Thrace, Xanthi 6700 Greece Email : diam@duth.gr Phone : +30-54-7926 Fax : +30-54-76457 Director of Computer & Network Administration Center of DUTHNet May, 200 Demetrios D. Diamantidis - DUTH
Basic Definitions Given a data set, that is a vector or a finite sequence of N real numbers X = [x, x,..., x ] i i i i t 2 N we define the following quantities. The mean value or average of data values x + x +... + x N i i i i 2 N µ i = x k = N k= N May, 200 Demetrios D. Diamantidis - DUTH 2
This is the most frequent value of data. Note that, generally, this value may not be part of the given sequence values! Now, to have a measure of how close to the mean is each value of the data set, we define the quantity variance as s = Var(X ) = (x µ ) = N i i 2 ii k i N k= = (x µ ) +,..., + (x µ ) N i 2 i 2 i N i May, 200 Demetrios D. Diamantidis - DUTH 3
The square root of this quantity is defined as the standard deviation of the data values, that is σ = σ = = = i i i s Var(X ) sqrt(var(x )) Note: We are using the appearing superscript as an identity of the given data set May, 200 Demetrios D. Diamantidis - DUTH 4
Now, if we have the data sets X i and X j we ask the question if there is some resemblance between them. To start with, we define the quantity covariance of the data sequences as s CoVar(X,X ) (x )(x ) N ij = i j i i = k µ i k N k = µ j May, 200 Demetrios D. Diamantidis - DUTH 5
From this quantity we define another one, namely, correlation coefficient,as r ij s ij = = σσ s s ij s ii jj ii jj Where, we recognize σ ii and σ jj as the standard deviations of the data sets X i and X j respectively and s ij as their covariance. May, 200 Demetrios D. Diamantidis - DUTH 6
Now observe that, when the data values of the given data sequences are fully correlated, that is the data sets are identical (we are dealing actually with the same data sequence), then the correlation coefficient becomes. If the values of the given data sequences are less correlated, then the correlation coefficient is approaching 0. May, 200 Demetrios D. Diamantidis - DUTH 7
We can loosely say that correlation is the trend of the corresponding values in both data sets to vary the same way. That is, when some value in the fist data set is increasing, in respect to it s previous one, then the corresponding value on the second data set is also increasing, almost by the same amount and vise verse. Question: Is there a way of de-correlating correlated data sets? May, 200 Demetrios D. Diamantidis - DUTH 8
PCA answers (solves) that question (problem) May, 200 Demetrios D. Diamantidis - DUTH 9
Graphics Two Color Plot Suppose we are given the data sets t 2 t X = [2436],X = [23445] We construct the five 2-D vectors from the corresponding positions values 2 4 3 6 V =,V 2,V 3,V 4,V5 2 = = = = 3 4 4 5 May, 200 Demetrios D. Diamantidis - DUTH 0
Plotting them in 2-D space we see that their spots have a trend to lay in a line x 2 V 3 V 4 V 5 V 2 V x May, 200 Demetrios D. Diamantidis - DUTH
Computing their statistics we have the following: µ 6 / 5 3.2 µ= = = µ 2 8 / 5 3.6 s = Var(X ) = 2.96 2 s22 = Var(X ) =.04 2 2 s2 = s2 = CoVar(X,X ) = CoVar(X,X ) =.68 May, 200 Demetrios D. Diamantidis - DUTH 2
The square matrix of their variances and co variances is termed covariance matrix of the given data sets, that is, the covariance matrix of data sets X, X 2 is s s 2.96.68 s s.68.04 2 Σ X = = 2 22 May, 200 Demetrios D. Diamantidis - DUTH 3
Generally for M data sets X, X 2,, X M with equal number of values we define their covariance matrix as s s 2. sm s s. s Σ X = = =.... sm s M2. smm with s = s 2 2 2M s ij,i, j,..., M ij ji May, 200 Demetrios D. Diamantidis - DUTH 4
The corresponding correlation matrix of our example data sets is : R X r r 0.9803 r r 0.9803 = 2 = 2 22 Observing the correlation matrix, we conclude that each data set is 00% correlated with itself and 98% correlated with the other. May, 200 Demetrios D. Diamantidis - DUTH 5
Now, is it possible with axis rotation, to compute new values for the given data sets in the new coordinate system in a manner such that they exhibit less correlation. This new axis system is said that point to principal directions and the produced data sets are termed principal components May, 200 Demetrios D. Diamantidis - DUTH 6
It is known from coordination transformation that this is a linear transformation of the form Y=GX where G is the required transformation matrix, X the old data sets and Y the new ones in the new axis system. We wish that the Y data sets to exhibit no correlation, so their covariance matrix must be diagonal. Lets have some computations! May, 200 Demetrios D. Diamantidis - DUTH 7
For covariance matrices in data sets Y and X we have Y X Σ Y = s ij, Σ X = s ij, with s CoVar(Y,Y ) (Y )(Y ) N Y i j i Y i Y ij = = k µ i k µ j N k= s CoVar(X,X ) (X )(X ) N X i j i X i X ij = = k µ i k µ j N k= Substituting Y i = GX i we have May, 200 Demetrios D. Diamantidis - DUTH 8
Noting that Y Y Y t Σ Y = s ij = (Y µ )(Y µ ) = N (GX G )(GY G ) N X X t t = G (X µ )(X µ ) G = N Y Y t = µ µ = = G s G = GΣ G X t t ij X May, 200 Demetrios D. Diamantidis - DUTH 9
So we remark that t Σ = GΣ G Y X with the demand Σ Y to be diagonal. Recall now that Σ Y are the eigenvalues of Σ X and the requested transformation matrix G is the inverse matrix of eigenvectors of Σ X At last, we can now compute the new principal components data set Y from the old data sets X. May, 200 Demetrios D. Diamantidis - DUTH 20
Image Processing basics Recall that an image (remote sensing scene) is a matrix (usually square) with elements (picture elements pixels) non negative integer values called gray levels or Digital Numbers (DN), in some range [0,2 n -] referred as Spectral Resolution or Dynamic Range, expressed in bits (n bits). May, 200 Demetrios D. Diamantidis - DUTH 2
Now, every image P=[p ij ], i,j=,,n, can be considered as a data set. So all the previous apply to them. So PCA deals with the decorrelation of images. Practically, is it possible to make different spectral images of the same surface area, which appear almost identical in their gray level variation, to be differentiated, that is to make features more distinctive from one scene to the other. May, 200 Demetrios D. Diamantidis - DUTH 22
Arranging eigenvalues in decreasing order we produce, with the PCA transformation, data sets Y i = GX i that exhibits maximum variances in decreasing order. The Principal Components Transformation it is also known as : Hotelling Transform or Karhunen-Loève Transform May, 200 Demetrios D. Diamantidis - DUTH 23
Because our transformation must be also a zero correlation transformation, the rotation transformation matrix must be orthogonal, that is G - = G t So we require our eigenvectors to be orthonormal, that is the vector columns must be normalized. This lead to a certain set of eigenvectors and to a specific PCA transformation matrix. May, 200 Demetrios D. Diamantidis - DUTH 24
To conclude, finally, our presentation lets compute the required transformation matrix of our example data sets and the new ones, using PCA. Recall that our example data sets covariance matrix is : Σ = X 2.96.68.68.04 May, 200 Demetrios D. Diamantidis - DUTH 25
Solve the characteristic equation to find the eigenvalues: 2.96-λ.68 φλ ( ) = = 0.68.04-λ (2.96- λ)(.04- λ).68.68 = 0 λ2 4λ+ 0.256=0 λ = 3.9349, λ = 0.065, λ +λ = 4 2 2 May, 200 Demetrios D. Diamantidis - DUTH 26
So, the covariance matrix in the Y space is 3.9349 0 Σ = Y 0 0.065 Lets find now the eigenvectors of Σ X We must solve the (homogeneous) equation Σ V =λ V X 2.96.68 v v v = λ = 3.9349.68.04 v v v 2 2 2 May, 200 Demetrios D. Diamantidis - DUTH 27
. 2.96v +.68v = 3.9349v 2.68v +.04v = 3.9349v 2 2-0.9749v +.680v = 0 2.6800v -2.8949v = 0 2 v =.7232v 2 0.8649 2 2 v = v v + = 0.509 2 May, 200 Demetrios D. Diamantidis - DUTH 28
For the second eigenvalue we have : Σ V =λ V X 2 2 2 2.96.68 v v v 2 2 2 = λ = 0.065.68.04 v 2 v v 22 22 22 May, 200 Demetrios D. Diamantidis - DUTH 29
. 2.96v +.68v = 0.065v 2 22 2.68v +.04v = 0.065v 2 22 22 2.8949v +.680v = 0 2 22.6800v +0.9749v = 0 2 22 v = 0.5803v 2 22 0.509 2 2 v = v v 2 + = 0.8649 2 22 May, 200 Demetrios D. Diamantidis - DUTH 30
Therefore the required principal components transformation matrix is : G t 0.8649 0.509 0.8649 0.509 = = 0.509 0.8649-0.509 0.8649 And we transform the original data sets X and X 2 to the new data sets Y and Y 2 as May, 200 Demetrios D. Diamantidis - DUTH 3
Y 0.8649 0.509 X = Y 2-0.509 0.8649 = Y 2 0.8649 0.509 2 4 3 6 = -0.509 0.8649 2 3 4 4 5 =.8688 3.2356 5.4673 4.6024 7.699 =.2279.5909.459.9539.330 2 3 5 5 8 2 2 May, 200 Demetrios D. Diamantidis - DUTH 32
We see that the st principal component, that is the data set Y exhibits maximum variation and the 2 nd one less variation, which actually is null. NOTE: Usually, after PCT, we need to make an origin shift, in order to eliminate negative pixels values in the resulting images. This shift has no influence in the covariance matrices. May, 200 Demetrios D. Diamantidis - DUTH 33
THE END! May, 200 Demetrios D. Diamantidis - DUTH 34
Dr. Demetrios D. Diamantidis Assistant Professor, Section of Telecommunications and Space Sciences, Department of Electrical and Computer Engineering, School of Engineering, Demokritos University of Thrace, Xanthi 6700 Greece Email : diam@duth.gr Phone : +30-54-7926 Fax : +30-54-76457 Director of Computer & Network Administration Center of DUTHNet
ΣΤΑΤΙΣΤΙΚΗ Βασικοί Ορισµοί Έστω µια πεπερασµένη ακολουθία Ν πραγµατικών αριθµών X = {x, x 2,..., x N}. Ορίζουµε τις εξής ποσότητες για τις δοθείσες τιµές Χ: Μέση τιµή ή µέσος όρος (mean value, average) N x+ x 2 +... + x N µ= xn = N n= N Η τιµή αυτή είναι η πλησιέστερη τιµή στην πιο συχνή τιµή τους. Σηµειώνεται ότι ή µέση τιµή µπορεί να εµφανίζεται ή να µην εµφανίζεται στις δοθείσες τιµές Χ. Μεταβλητότητα ή διακύµανση ή διασπορά (variance, fluctuation, dispersion) N 2 2 2 (x µ ) +,..., + (x N µ ) s= Var(X) = (x n µ ) = N n= N Η τιµή αυτή είναι πάντα θετική και δίνει ένα µέτρο του πόσο απέχουν απολύτως, κατά µέσο όρο, οι δοθείσες τιµές Χ από τη µέση τιµή τους. Τυπική απόκλιση (standard deviation) σ= s = Var(X) Η τιµή αυτή δίνει τη µέση τιµή των αποστάσεων των δοθέντων τιµών Χ από τη µέση τιµή τους. Έστω δύο πεπερασµένες ακολουθίες Ν πραγµατικών αριθµών X = {x, x 2,..., x N} και Y = {y, y,..., y }. Ορίζουµε τις εξής ποσότητες για τις δοθείσες τιµές των Χ και Υ: 2 N Συµµεταβλητότα (covariance) N s = CoVar(X, Y) = CoVar(Y, X) = (x µ )(y µ ) XY n X n Y N n= (x µ X)(y µ Y) +,..., + (x N µ X)(y N µ Y) = N Η τιµή αυτή δίνει ένα µέτρο της συνδιακύµανσης των τιµών των Χ και Υ, δηλαδή αν υπάρχει τάση οι αντίστοιχες τιµές τους να αυξάνουν ή να µειώνονται ταυτόχρονα. Αν οι δύο ακολουθίες ταυτίζονται, δηλαδή Χ=Υ, η συµµεταβλητότητα ταυτίζεται µε τη µεταβλητότητα. Συντελεστής συσχέτισης (correlation coefficient) sxy sxy rxy = = σσ s s X Y X Y Η τιµή αυτή δίνει επίσης ένα µέτρο συσχέτισης των τιµών των δύο ακολουθιών. Σε περίπτωση που οι δύο ακολουθίες ταυτίζονται ο συντελεστής συσχέτισής του γίνεται ενώ όσο διαφοροποιούνται τείνει προς το 0.
Συµβολισµοί Προκειµένου να χειριστούµε περισσότερες από δύο ακολουθίες πραγµατικών αριθµών, ίσου πλήθους Ν, εισάγουµε τον συµβολισµό i i i i X = {x, x 2,..., x N}, i =,2,...,K. Ο άνω δείκτης προσδιορίζει µια συγκεκριµένη ακολουθία τιµών από της Κ ακολουθίες. Είναι δηλαδή δείκτης αναγνώρισης και όχι ύψωση σε δύναµη. Έτσι οι πιο πάνω σχέσεις γίνονται: Μέση τιµή N µ i = N n = x i n Μεταβλητότητα N i i 2 si = Var(X ) = (x n µ i) = sii N n = Τυπική απόκλιση σ = s = Var(X ) = s =σ i i i ii ii Συµµεταβλητότητα s CoVar(X, X ) CoVar(X, X ) (x )(x ) N i j j i i j ij = = = n µ i n µ j N n = Συντελεστής συσχέτισης r s ij ij ij = = = σσ ii jj sii sjj Για Κ ακολουθίες ορίζεται ο πίνακας συµµεταβλητότητας (covariance matrix) s r ji s s... s s s... s...... s s... s 2 K 2 22 2K ΣX = ij = ji K K2 KK,s s και ο πίνακας συσχέτισης (correlation matrix) R r r 2... rk r2 r 22... r 2K =, r = r, r =, 0 r ij...... r K r K2... rkk X ij ji ii
ΘΕΩΡΙΑ ΠΙΝΑΚΩΝ Βασικοί Ορισµοί Έστω διάνυσµα στήλη = [ x x... x ] t G g g... g g g... g...... g g... g 2 K 2 22 2K = K K2 KK x και γραµµικός µετασχηµατισµός 2 K Αν υπάρχει πραγµατικός αριθµός λ τέτοιος ώστε να ισχύει Gx =λx, τότε τα διανύσµατα x που ικανοποιούν αυτή τη σχέση ονοµάζονται ιδιοδιάνυσµα ή χαρακτηριστικά ή λανθάνοντα ή αναλλοίωτα διανύσµατα (eigenvectors, characteristic, latent, invariant vectors) και οι αντίστοιχες τιµές λ ιδιοτιµές (eigenvalues, characteristic roots, latent roots) του µετασχηµατισµού G. Θα πρέπει να ισχύει: Gx =λx Gx λ x = 0 ( G Iλ ) x = 0 Αυτό είναι ένα σύστηµα οµογενών εξισώσεων και, για να έχει λύση διάφορη της προφανούς, θα πρέπει η ορίζουσα των συντελεστών του συστήµατος να είναι ίση µε 0, δηλαδή: g λ g... g g λ g... g 2 K 2 K g2 g 22 λ... g2k g2 g 22 λ... g2k = 0, ϕλ ( ) =, ϕλ ( ) = 0............ g g... g λ g g... g K K2 KK K K2 KK λ Η εξίσωση αυτή είναι Κ βαθµού και έχει Κ ρίζες. Το φ(λ) ονοµάζεται χαρακτηριστικό πολυώνυµο και η εξίσωση φ(λ)=0 χαρακτηριστική εξίσωση του G. Η λύση αυτής της εξίσωσης, ως προς λ, δίνει τις ιδιοτιµές λ, από όπου προσδιορίζονται και τα ιδιοδυανύσµατα x του µετασχηµατισµού G. ύο πίνακες Χ και Υ ονοµάζονται όµοιοι (similar) αν υπάρχει πίνακας Ρ τέτοιος - ώστε να ισχύει: Y=P XP t Ένας τετραγωνικός πίνακας Χ ονοµάζεται ορθογώνιος αν ισχύει XX = I, t δηλαδή αν ισχύει : X = X Ένας τετραγωνικός πίνακας ισχύει yij = 0, i j Y = [y ], i, j =,2,...,K ονοµάζεται διαγώνιος αν ij Οι ιδιοτιµές ενός διαγώνιου πίνακα είναι τα διαγώνια στοιχεία του.
Κάθε πίνακας Χ όµοιος µε διαγώνιο πίνακα Υ έχει τις ίδιες ιδιοτιµές µε αυτόν και τα διανύσµατα στήλες του Ρ είναι τα ιδιοδιανύσµατά του, τα οποία και είναι γραµµικώς ανεξάρτητα.
Χ = [ 7 4 4 4 3 3 2 4 ] Υ = [ 3 3 0 0 3 3 2 2 5 ] ιάγραµµα δύο χρωµάτων X, Y µ = 8.30 X sx = 22.800 µ = 5.20 Y s Y =25.9600 ιάγραµµα εδοµένων Χ, Υ s = s = 6.7400 XY YX
Σ XY 22.800 6.7400 = 6.7400 25.9600 R XY.0000 0.2770 = 0.2770.0000 P ΣXY 0.7834-0.625 = 0.625 0.7834-0.7834 0.625 P Σ = XY -0.625-0.7834 7.4634 0 D = 0 3.3066 D= P Σ P ΣXY XY ΣXY X'=DX,Y'=DY X = [ 4 9 0 3 2 8 2 ] Y = [ 3 7 7 9 0 5 8 3 9 20 ] ιάγραµµα δύο χρωµάτων X, Y
ιάγραµµα εδοµένων Χ, Υ