Multivariate Models, PCA, ICA and ICS

Σχετικά έγγραφα

Other Test Constructions: Likelihood Ratio & Bayes Tests

SCHOOL OF MATHEMATICAL SCIENCES G11LMA Linear Mathematics Examination Solutions

2 Composition. Invertible Mappings

Statistical Inference I Locally most powerful tests

Μηχανική Μάθηση Hypothesis Testing

= λ 1 1 e. = λ 1 =12. has the properties e 1. e 3,V(Y

Example Sheet 3 Solutions

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

CHAPTER 25 SOLVING EQUATIONS BY ITERATIVE METHODS

Second Order Partial Differential Equations

Congruence Classes of Invertible Matrices of Order 3 over F 2

6.1. Dirac Equation. Hamiltonian. Dirac Eq.

C.S. 430 Assignment 6, Sample Solutions

ST5224: Advanced Statistical Theory II

6.3 Forecasting ARMA processes

Lecture 2: Dirac notation and a review of linear algebra Read Sakurai chapter 1, Baym chatper 3

Numerical Analysis FMN011

derivation of the Laplacian from rectangular to spherical coordinates

4.6 Autoregressive Moving Average Model ARMA(1,1)

Reminders: linear functions

Phys460.nb Solution for the t-dependent Schrodinger s equation How did we find the solution? (not required)

( ) 2 and compare to M.

Partial Differential Equations in Biology The boundary element method. March 26, 2013

Space-Time Symmetries

Solutions to Exercise Sheet 5

forms This gives Remark 1. How to remember the above formulas: Substituting these into the equation we obtain with

Concrete Mathematics Exercises from 30 September 2016

Chapter 6: Systems of Linear Differential. be continuous functions on the interval

5.4 The Poisson Distribution.

Statistics 104: Quantitative Methods for Economics Formula and Theorem Review

Every set of first-order formulas is equivalent to an independent set

Srednicki Chapter 55

Lecture 21: Properties and robustness of LSE

Solution Series 9. i=1 x i and i=1 x i.

Strain gauge and rosettes

Section 8.3 Trigonometric Equations

Math 6 SL Probability Distributions Practice Test Mark Scheme

6. MAXIMUM LIKELIHOOD ESTIMATION

3.4 SUM AND DIFFERENCE FORMULAS. NOTE: cos(α+β) cos α + cos β cos(α-β) cos α -cos β

Inverse trigonometric functions & General Solution of Trigonometric Equations

Jesse Maassen and Mark Lundstrom Purdue University November 25, 2013

5. Choice under Uncertainty

A Note on Intuitionistic Fuzzy. Equivalence Relation

Approximation of distance between locations on earth given by latitude and longitude

The ε-pseudospectrum of a Matrix

Tridiagonal matrices. Gérard MEURANT. October, 2008

Section 9.2 Polar Equations and Graphs

Estimation for ARMA Processes with Stable Noise. Matt Calder & Richard A. Davis Colorado State University

Matrices and Determinants

PARTIAL NOTES for 6.1 Trigonometric Identities

Exercises 10. Find a fundamental matrix of the given system of equations. Also find the fundamental matrix Φ(t) satisfying Φ(0) = I. 1.

Nowhere-zero flows Let be a digraph, Abelian group. A Γ-circulation in is a mapping : such that, where, and : tail in X, head in

Lecture 34 Bootstrap confidence intervals

Απόκριση σε Μοναδιαία Ωστική Δύναμη (Unit Impulse) Απόκριση σε Δυνάμεις Αυθαίρετα Μεταβαλλόμενες με το Χρόνο. Απόστολος Σ.

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

w o = R 1 p. (1) R = p =. = 1

Ordinal Arithmetic: Addition, Multiplication, Exponentiation and Limit

HOMEWORK 4 = G. In order to plot the stress versus the stretch we define a normalized stretch:

Math221: HW# 1 solutions

Lecture 15 - Root System Axiomatics

ΚΥΠΡΙΑΚΗ ΕΤΑΙΡΕΙΑ ΠΛΗΡΟΦΟΡΙΚΗΣ CYPRUS COMPUTER SOCIETY ΠΑΓΚΥΠΡΙΟΣ ΜΑΘΗΤΙΚΟΣ ΔΙΑΓΩΝΙΣΜΟΣ ΠΛΗΡΟΦΟΡΙΚΗΣ 19/5/2007

EE512: Error Control Coding

Lecture 2. Soundness and completeness of propositional logic

Problem Set 3: Solutions

ω ω ω ω ω ω+2 ω ω+2 + ω ω ω ω+2 + ω ω+1 ω ω+2 2 ω ω ω ω ω ω ω ω+1 ω ω2 ω ω2 + ω ω ω2 + ω ω ω ω2 + ω ω+1 ω ω2 + ω ω+1 + ω ω ω ω2 + ω

Notes on the Open Economy

Finite Field Problems: Solutions

The Simply Typed Lambda Calculus

Bayesian statistics. DS GA 1002 Probability and Statistics for Data Science.

Repeated measures Επαναληπτικές μετρήσεις

CHAPTER 48 APPLICATIONS OF MATRICES AND DETERMINANTS

Homework 3 Solutions

CRASH COURSE IN PRECALCULUS

Econ 2110: Fall 2008 Suggested Solutions to Problem Set 8 questions or comments to Dan Fetter 1

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

Exercise 2: The form of the generalized likelihood ratio

k A = [k, k]( )[a 1, a 2 ] = [ka 1,ka 2 ] 4For the division of two intervals of confidence in R +

Areas and Lengths in Polar Coordinates

Second Order RLC Filters

Fractional Colorings and Zykov Products of graphs

Areas and Lengths in Polar Coordinates

Mean-Variance Analysis

Homework 8 Model Solution Section

TMA4115 Matematikk 3

Partial Trace and Partial Transpose

ΕΙΣΑΓΩΓΗ ΣΤΗ ΣΤΑΤΙΣΤΙΚΗ ΑΝΑΛΥΣΗ

Parametrized Surfaces

New bounds for spherical two-distance sets and equiangular lines

Tutorial on Multinomial Logistic Regression

2. THEORY OF EQUATIONS. PREVIOUS EAMCET Bits.

Assalamu `alaikum wr. wb.

SOLUTIONS TO MATH38181 EXTREME VALUES AND FINANCIAL RISK EXAM

HISTOGRAMS AND PERCENTILES What is the 25 th percentile of a histogram? What is the 50 th percentile for the cigarette histogram?

Example of the Baum-Welch Algorithm

Queensland University of Technology Transport Data Analysis and Modeling Methodologies

Problem Set 9 Solutions. θ + 1. θ 2 + cotθ ( ) sinθ e iφ is an eigenfunction of the ˆ L 2 operator. / θ 2. φ 2. sin 2 θ φ 2. ( ) = e iφ. = e iφ cosθ.

Dr. D. Dinev, Department of Structural Mechanics, UACEG

Démographie spatiale/spatial Demography

Uniform Convergence of Fourier Series Michael Taylor

Transcript:

Multivariate Models, PCA, ICA and ICS Klaus Nordhausen 1 Hannu Oja 1 Esa Ollila 2 David E. Tyler 3 1 Tampere School of Public Health University of Tampere 2 Department of Mathematical Science University of Oulu 3 Department of Statistics Rutgers - The State University of New Jersey

Outline Preliminaries Multivariate Models Whitening PCA ICA ICS R

Important Matrices and Decompositions A p p permutation matrix P is obtained by permuting the rows and/or columns of the identity matrix I p. The sign-change matrix J is a diagonal matrix with diagonal elements ±1. The p p matrix O is an orthogonal matrix if O T O = OO T = I p. P and J are orthogonal matrices. D is a diagonal matrix. Let A be p p matrix. The singular value decomposition (SVD) of A is A = O DO T. Let A be p p matrix. The eigenvalue decomposition of A is A = ODO T.

Affine transformation Let x be a p-vector. An affine transformation has then the form y = Ax + b, where A is a full rank p p matrix with SVD A = O DO T and b is a p vector. This transformation can be interpreted as a change in coordinate system. The original system of x is first rotated / reflected using O T then componentwise rescaled using D, again rotated / reflected using O and finally the origin of the system is shifted by b. In statistics methods and estimates are preferred that do not depend on the underlying coordinate system. A method like for example a hypothesis test is called affine invariant if it does not depend on the coordinate system, i.e. the p-value does not change when the coordinate system is changed. Estimates are called affine equivariant if they follow a change in the coordinate system in an appropriate way.

Location and Scatter Functionals Let x be a p-variate random vector with cdf F. A p-vector valued functional T = T(F ) = T(x) is called a location functional if it is affine equivariant in the sense that T(Ax + b) = AT(x) + b for all full-rank p p-matrices A and for all p-vectors b. A p p matrix S = S(F) = S(x) is a scatter functional if it is affine equivariant in the sense that S(Ax + b) = AS(x)A T for all full-rank p p-matrices A and for all p-vectors b. The corresponding sample versions are called location statistic and scatter statistic.

Special Scatter Functionals A symmetrized scatter functional version of a scatter functional S can be obtained by using two independent copies x 1 and x 2 of x S sym (x) := S(x 1 x 2 ). A shape matrix is only affine equivariant in the sense that S(Ax + b) AS(x)A T.

Independence property A scatter functional S has the independence property if S(x) is a diagonal matrix for all x with independent margins. Note that in general scatter functionals do not have the independence property. Only the covariance matrix, the matrix of fourth moments and symmetrized scatter matrices have this property. If, however, x has independent and at least p 1 symmetric components all scatter matrices will be diagonal matrices.

M-estimates of Location and Scatter There are a lot of techniques to construct location and scatter estimates. In this talk we will mainly use M-estimates. M-estimators for Location (T) and Scatter (S) satisfy the following two implicit equations: and T = T(x) = (E(w 1 (r))) 1 E (w 1 (r)x) S = S(x) = E ( w 2 (r)(x T)(x T) T ) for some suitably chosen weight functions w 1 (r) and w 2 (r). The scalar r is the Mahalanobis distance between x and T, that is, r = x T S.

Examples of Scatter Matrices The most common location and scatter statistics are the vector of means E(x) and the regular covariance matrix COV(x). Using those two and compute a so called 1 step M-estimator one yields as a scatter the matrix of fourth moments. COV 4 (x) = 1 p + 2 E(r 2 (x E(x))(x E(x)) T ) Another famous M-estimator is Tyler s shape matrix. ( ) (x T(x))(x T(x)) T S Tyl (x) = pe x T(x) 2 S The symmetrized version of Tyler s shape matrix is known as Dümbgens s shape matrix.

Location - Scatter Model All models discussed in this talk can be derived from the location scatter model x = Ωɛ + µ, where x = (x 1,..., x p ) T is a p-variate random vector, ɛ = (ɛ 1,..., ɛ p ) T is a p-variate random vector standardized in a way explained later, Ω a full rank p p mixing matrix and µ a p-variate location vector. The quantity Σ = ΩΩ T is the scatter matrix parameter. For further analysis the assumptions imposed on ɛ are crucial. Note however, that the standardized vector ɛ is actually not observed, only x is directly measureable. The vector ɛ is rather a mental construction than something with a physical meaning. Yet, in some cases, ɛ can have an interpretation and it might even be the goal of the analysis to recover it when only x is observed. Either way, one of the first challenges one faces in practical data analysis is to evaluate which assumptions on ɛ can be justified for the data at hand.

Symmetric Models A1: Multivariate normal model. ɛ has a standard multivariate normal distribution N(0, I p ). A2: Elliptic model. ɛ has a spherical distribution around the origin, i.e. Oɛ ɛ for all orthogonal p p matrices O. A3: Exchangeable sign-symmetric model. In this model ɛ is symmetric around the origin in the sense that PJɛ ɛ for all permutation matrices P and sign change matrices J. A4: Sign-symmetric model. ɛ is symmetric around the origin in the sense that Jɛ ɛ for all sign change matrices J. A5: Central symmetric model. ɛ is symmetric around the origin in the sense that ɛ ɛ.

Asymmetric Models B1: Finite mixtures of elliptical distributions with proportional scatter matrices. For a fixed k is ɛ = k i=1 p i(ɛ i + µ i ), where p = (p 1,..., p k ) is Multin(1, π) distributed with 0 π i 1 and k i=1 π i = 1 and ɛ i s are all independent and follow A2. B2: Skew-elliptical model. ɛ = sgn(ɛ p+1 α βɛ p)ɛ, where (ɛ T, ɛ p+1 )T satisfies A2, but with dimension p + 1, and α, β R are constants. B3: Independent component model. The components of ɛ are independent with E(ɛ i ) = 0 and Var(ɛ i ) = 1.

Relationships Between the Models The symmetric models A1-A5 are ranked from the strongest symmetry assumption to the weakest one, which means A1 A2 A3 A4 A5. The symmetric models can be seen as border cases of the asymmetric models and we have the following relationships: A1 = A2 B3 A2 B1 and A2 B2

Uniqueness of Model parameters The location parameter µ is well-defined in models A1-A5. In that case µ is the center of symmetry. The scatter parameter Σ is only well defined in models A1 and B3. It is the covariance matrix. The mixing matrix Ω is not well defined in any of the models. Further assumptions are needed in most models in order to have well-defined parameters Ω and Σ. Relation between location and scatter functionals in the different models: Location statistics estimate in A1-A5 the same population quantity, the center of symmetry. Scatter statistics measure in A1-A3 the same population quantity. They measure the shape of data, i.e. S(x) Σ.

Data Transformations In practical data analysis data is often transformed to different coordinates. The transformation can be model based or not and may be motivated by different reasons. Transformations we will have a closer look at are: Whitening Principal components. Independent components. Invariant components. Other transformations are for example: Canonical correlations. Factor analysis. Most transformations are based usually on E(x) and COV(x).

Whitening The whitening transformation centers the variables and jointly uncorrelates and scales them to have unit variances. The transformation is where then Properties: y = COV 1/2 (x)(x E(x)), E(y) = 0 and COV(y) = I p. In A1 it makes the marginal variables independent. In A2 y will be spherical. It does usually not recover ɛ. The resulting coordinate system is not affine invariant. It only holds that COV 1 2 (Ax + b)(ax + b E(Ax + b)) = O COV 1 2 (x)(x E(x)).

Principal Components The principal components also uncorrelates marginals. It does however not scale them to have unit variances but chooses the new variables successively in such a way that they are linear combination of the original variables which have maximal variation under the condition of being orthogonal with the previous ones. The transformation is based on the eigenvalue decomposition of the regular covariance COV(x) = ODO T. The principal components are with COV(y) = D. Variations of PCA: y = O T x, centered PCA. The variables are first centered, i.e. x x E(x). scaled PCA. Each variable is scaled to have unit variance, i.e. x i x i /σ i. Where σ 2 i is the variance of x i.

Properties and Purpose of PCA PCA has the properties: not affine invariant. not model based, only assumes existence of the first two moments. lot of nice geometrical properties, especially in A1. Main purposes: dimension reduction. in connection with other multivariate techniques. For example in regression to avoid multicollinearity problems or in clustering. outlier detection.

Comparison Whitening vs. PCA In the following example 70 observations are sampled from a bivariate normal distribution. Original data 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 Whitenend data 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 Principal components 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5

Robust Whitening and Robust PCA The mean vector and the covariance matrix are not very robust and therefore the transformations based on them can be dominated by only a few influential observations. A common method to robustify classical methods is to replace the mean vector and the covariance matrix by more robust robust location and scatter functionals. This of course leads only to the same method if the functionals estimate the same population quantity. Which basically means this is possible only in models A1-A3.

Independent Component Analysis Independent component analysis (ICA) can be seen as a model based transformation. Let assume for simplicity that µ = 0. Then the IC model B3 reduces to x = Ωɛ, where ɛ has independent components. The model is however ill defined since also x = (ΩPDJ)(JD 1 P T )ɛ = Ω ɛ where ɛ has independent components. Therefore the components can be rescaled, permuted and their sign changed. To fix the scale a common assumption is E(ɛ) = 0 and COV(ɛ) = I p.

Estimation of the Mixing Matrix In ICA the main goal is to estimate the mixing matrix Ω respectively an unmixing matrix Λ. This is possible up to sign and permutation if at most one component are gaussian. In some cases then the up to sign and order recovered underlying independent components z = Λx have physical meaning. Application areas of ICA of are: signal processing. medical image analysis. dimension reduction.

Common Structure of Most ICA Algorithms Most ICA algorithms consist of the following two steps. Step 1: Whitening. x is whitened so that E(x) = 0 and COV(x) = I p. Then x = Oz with an orthogonal matrix O and independent components z which have E(z ) = 0 and COV(z ) = I p. Step 2: Rotation. Find an orthogonal matrix O whose columns maximize (minimize) a chosen criterion g(o T x). Common criteria are measures of marginal nongaussianity (negentropy, moments) and likelihood functions. Well known algorithms are fastica, FOBI, JADE, PearsonICA,...

Picture example for ICA

Multiple functionals for transformation So far the transformations involved mainly one location and / or one scatter functional. Recently is was suggested to use two location functionals and / or two scatter functionals for this purpose. Useful to recall in this context is that skewness can be interpreted as the difference of two different location functionals and kurtosis as the ratio of two different scatter functionals. For example the classical univariate measures: β 1 (x) = E((x E(x))3 ) var(x) 3/2 = E 3(x) E(x) var(x) with E 3 (x) = E (( ) ) x E(x) x var(x) β 2 (x) = E((x E(x))4 ) var(x) 2 = var 4(x) var(x) with var 4 (x) = E (( ) ) x E(x) (x E(x)) 2 var(x)

Two Scatter Matrices for ICS Two different scatter matrices S 1 = S 1 (x) and S 2 = S 2 (x) can be used to find an invariant coordinate system (ICS) as follows: Starting with S 1 and S 2, define a p p transformation matrix B = B(x) and a diagonal matrix D = D(x) by S 1 1 S 2B T = B T D that is, B gives the eigenvectors of S 1 1 S 2. The following result can then be shown to hold. The transformation x z = B(x)x yields an invariant coordinate system in the sense that B(Ax)(Ax) = JB(x)x for some p p sign change matrix J.

Interpretation of the ICS Transformation This transformation can be seen as a double decorrelation. First x is whitened using S 1 and on the whitened data a PCA is performed with respect to S 2. Basically this can also be seen as a way to compare two scatter matrices. If the PCA step finds still an interesting direction after the whitening step then the two scatter matrices measure different population quantities. The diagonal elements of D can be seen as ratios of two different scatter functionals. Hence the values can be seen as kurtosis measures and therefore the invariant coordinates are according to their kurtosis measures with respect to S 1 and S 2.

An extension of ICS The sign of the column vectors of B, respectively of the invariant coordinates z is not fixed. One option is here to use two pairs (T 1, S 1 ) and (T 2, S 2 ) and fix the sign then such that The the a way to construct the invariant coordinate system is to center x using T 1, i.e. x x T 1 (x) estimate B and the invariant coordinate z fix the signs of z so that T 2 (z) 0. In that case T 2 (z) 0 can be seen as the difference of two location measures and therefore as a measure of skewness. If the D has distinct diagonal entries and T 2 (z) > 0 the transformation is well defined. Denote this version of ICS from now on as centered ICS.

Applications of ICS ICS is useful in the following context: Descriptive statistics and model selection (centered ICS). Outlier detection and clustering. Dimension reduction. ICA, if S 1 and S 2 have the independence property and the IC model holds then B is an estimate of the unmixing matrix. In the transformation-retransformation framework of multivariate nonparametric methods. Some of the applications will be now discussed in more detail.

On the choice of S 1 and S 2 As shown previously there are a lot of possibilities for S 1 and S 2 to choose from. Since all the scatter matrices have different properties, these can yield different invariant coordinate systems. Unfortunately, there are so far no theoretic results about the optimal choice. This is still an open research question. Some comments are however already possible: For two given scatter matrices, the order has no effect Depending on the application in mind the scatter matrices should fulfill some further conditions. Practise showed so far that for most data sets different choices yield only minor differences.

Descriptive Statistics based on Centered ICS Using the two pairs (T 1, S 1 ) and (T 2, S 2 ) then a data set can be described via The location: T 1 (x) The scatter: S 1 (x) Measure of skewness: T 2 (z) Measure of Kurtosis: S 2 (z) The last two measures can then be used to construct tests for multinormality or ellipticity (see for example Kankainen et al. 2007).

Moment Based Descriptive Statistics using ICS An interesting choice for model selection is T 1 = E(x), S 1 = COV(x), T 2 = 1 p E( x E(x) 2 COV x), the vector of third moments, S 2 the matrix of fourth moments. Denote D(k) as the set of all positive definite diagonal matrices with at most k distinct diagonal entries, then the values of s and D in the eight models are: Model Skewness s Kurtosis D A1 0 I p A2 0 D(1) A3 0 D(1) A4 0 D(p) A5 0 D(p) B1 0 D(k) B2 e p or e 1 D(2) B3 0 D(p)

Finding outliers, structure and clusters In general, that the coordinates are ordered according to their kurtosis is a useful feature. One can assume that interesting directions are ones with extreme measures of kurtosis. Outliers for example are usually shown in the first coordinate. If the original data is coming from an elliptical distribution (A2), S 1 and S 2 measure the same population quantity and therefore the values of D in that case should be all the same. For non-elliptical distributions however they measure different quantities and therefore the coordinates can reveal "hidden" structures. In the case of x coming from an mixture of elliptical distributions (B1), ICS estimates heuristically spoken to Fisher s linear discriminant space. (Without knowing the class labels!)

Finding the outliers The modified wood gravity data is a classical data set for outlier detection methods. It has 6 measurements on 20 observations containing four outliers. Here S 1 is a M-estimator based on t 1 and S 2 based on t 2. x1 0.11 0.15 0.45 0.55 0.40 0.50 0.45 0.60 0.11 0.15 x2 x3 0.40 0.55 0.45 0.60 x4 x5 0.85 0.95 0.45 0.60 0.40 0.50 0.40 0.55 0.85 0.95 y Original data IC.1 9 11 13 1 1 3 1.5 0.0 28 24 9 11 13 IC.2 IC.3 14 16 18 1 1 3 IC.4 IC.5 30 28 28 24 1.5 0.0 14 16 18 30 28 IC.6 Invariant coordinates

Finding the structure I The following data set is simulated and has 400 observations for 6 variables. It looks like an elliptical data set. V1 10 0 10 4 0 4 10 0 5 5 0 5 10 0 10 V2 V3 3 0 2 4 0 4 V4 V5 6 2 2 5 0 5 10 0 10 3 0 2 6 2 2 V6 Original data

Finding the structure II Comp.1 5 0 5 2 2 4 0.5 0.5 10 5 15 5 5 Comp.2 Comp.3 6 0 4 2 2 4 Comp.4 Comp.5 2 0 2 10 0 10 0.5 0.5 6 2 2 6 2 0 2 Comp.6 PCA

Finding the structure III IC.1 4 1 2 2 0 2 2 0 1 2 3 0 2 4 4 1 2 IC.2 IC.3 3 0 2 2 0 2 IC.4 IC.5 2 0 2 3 0 2 4 2 0 2 3 0 2 2 0 2 IC.6 ICS

Clustering and dimension reduction To illustrate clustering via an ICS we use the Iris data set. The different species are colored differently and we can see, that the last component is enough for doing the clustering. IC.1 5 6 7 8 9 10 1.5 0.0 1.0 2.0 4 5 6 7 8 9 5 6 7 8 9 10 IC.2 IC.3 3 4 5 6 7 4 5 6 7 8 9 1.5 0.0 1.0 2.0 3 4 5 6 7 IC.4

ICS and ICA When the IC model holds then the ICS transformation recovers the independent components upto scale, sign and permutation if S 1 and S 2 have the independence property. the diagonal elements of D are all distinct. In the case of S 1 = COV and S 2 = COV 4 this corresponds to the FOBI algorithm and the condition of D having distinct elements means the independent components must have all different kurtosis values.

Implementation in R The two scatter transformation for ICS / ICA is implemented in R in the package ICS. The main function ics can take the name of two functions for S 1 and S 2 or two in advance computed scatter matrices. The R package offers then several functions to work with an ics object and offers also several scatter matrices and two tests of multinormality. Basically everything shown in this talk can be easily done in R using the methods of base R and the package ICS.

Australian Athletes Data, a More Detailed Example Using R We have now a closer look at the Australian Athletes data set using PCA and ICS on the four variables: Body Mass Index Body Fat Sum of Skin Folds Lean Body Mass > library(daag) > library(ics) > data(ais) > data.ana <- data.frame(bmi=ais$bmi, Bfat=ais$pcBfat, + ssf=ais$ssf, LBM=ais$lbm) > pairs(data.ana)

Australian Athletes Data, a More Detailed Example Using R > pairs(data.ana) BMI 5 15 25 35 40 60 80 100 20 25 30 35 5 15 25 35 Bfat ssf 50 100 150 200 20 25 30 35 40 60 80 100 50 100 150 200 LBM

Australian Athletes Data II Performing the PCA > pca.as<-princomp(data.ana,score=true) > pairs(pca.as$scores) Comp.1 40 0 20 2 0 2 4 100 0 40 0 20 Comp.2 Comp.3 4 2 0 2 4 100 0 2 0 2 4 4 2 0 2 4 Comp.4

Australian Athletes Data III > pairs(pca.as$scores,col=ais$sex) Comp.1 40 0 20 2 0 2 4 100 0 40 0 20 Comp.2 Comp.3 4 2 0 2 4 100 0 2 0 2 4 4 2 0 2 4 Comp.4

Australian Athletes Data IV Descriptive statistics using ICS > LOCATION <- colmeans(data.ana) > LOCATION BMI Bfat ssf LBM 22.95589 13.50743 69.02178 64.87371 > > SCATTER <- cov(data.ana) > SCATTER BMI Bfat ssf LBM BMI 8.202111 3.324882 29.94890 26.72126 Bfat 3.324882 38.313946 194.11891-29.27451 ssf 29.948901 194.118912 1060.50092-88.42529 LBM 26.721255-29.274514-88.42529 170.83006

Australian Athletes Data V > data.ana.centered <- sweep(data.ana, 2, LOCATION, "-") > ics.as <- ics(data.ana.centered, stdkurt = FALSE, + stdb="z") > summary(ics.as) ICS based on two scatter matrices S1: cov S2: cov4 The generalized kurtosis measures of the components are: [1] 1.7154 1.2788 0.9353 0.7659 The Unmixing matrix is: [,1] [,2] [,3] [,4] [1,] 0.1777-0.4586 0.0975-0.0527 [2,] 0.5371 0.1698-0.0552-0.0418 [3,] 0.3740-0.3108 0.0256-0.1471 [4,] 0.1142 0.4964-0.0775-0.0147

Australian Athletes Data VI > Z <- sweep(ics.components(ics.as),2, + sign(mean3(ics.components(ics.as))),"*") > > SKEWNESS <- mean3(z) > SKEWNESS IC.1 IC.2 IC.3 IC.4 0.48211688 0.09281212 0.14727342 0.16866117 > KURTOSIS <- ics.as@gkurt > KURTOSIS [1] 1.7154201 1.2787838 0.9352998 0.7658729

Australian Athletes Data VII > pairs(z) IC.1 2 0 1 2 3 2 0 1 2 2 0 1 2 3 4 2 0 1 2 3 IC.2 IC.3 3 1 1 2 3 2 0 1 2 3 4 2 0 1 2 3 1 1 2 3 IC.4

Australian Athletes Data VIII > pairs(z, col=ais$sex) IC.1 2 0 1 2 3 2 0 1 2 2 0 1 2 3 4 2 0 1 2 3 IC.2 IC.3 3 1 1 2 3 2 0 1 2 3 4 2 0 1 2 3 1 1 2 3 IC.4

Australian Athletes Data IX Analyzing only the females > data.ana.females<-data.ana[ais$sex=="f",] > pca.as.f<-princomp(data.ana.females,score=true) > pairs(pca.as.f$scores) Comp.1 15 5 5 15 3 1 1 3 100 0 50 15 5 5 15 Comp.2 Comp.3 2 0 2 4 100 0 50 3 1 1 3 2 0 2 4 Comp.4

Australian Athletes Data X Analyzing only the females > LOCATION <- colmeans(data.ana.females) > LOCATION BMI Bfat ssf LBM 21.9892 17.8491 86.9730 54.8949 > > SCATTER <- cov(data.ana.females) > SCATTER BMI Bfat ssf LBM BMI 6.969747 9.508302 60.63405 13.66024 Bfat 9.508302 29.734018 178.96012 15.33174 ssf 60.634049 178.960117 1145.86058 95.24906 LBM 13.660242 15.331741 95.24906 47.91675

Australian Athletes Data XI > data.ana.f.centered <- sweep(data.ana.females, 2, + LOCATION, "-") > ics.as.f <- ics(data.ana.f.centered, stdkurt = FALSE, + stdb="z") > summary(ics.as.f) ICS based on two scatter matrices S1: cov S2: cov4 The generalized kurtosis measures of the components are: [1] 1.3583 1.1320 1.0322 0.8330 The Unmixing matrix is: [,1] [,2] [,3] [,4] [1,] 0.2996-0.3870 0.0640-0.0034 [2,] -0.2892-0.0861 0.0496-0.0496 [3,] 0.5923 0.1401-0.0378-0.2158 [4,] 0.0414 0.6212-0.0863 0.0232

Australian Athletes Data XII > Z.f <- sweep(ics.components(ics.as.f),2, + sign(mean3(ics.components(ics.as.f))),"*") > > SKEWNESS <- mean3(z.f) > SKEWNESS IC.1 IC.2 IC.3 IC.4 0.32308843 0.16111629 0.13685658 0.06543187 > KURTOSIS <- ics.as.f@gkurt > KURTOSIS [1] 1.3583203 1.1320473 1.0322094 0.8329946

Australian Athletes Data XIII > pairs(z.f) IC.1 2 0 1 2 3 2 1 0 1 2 1 0 1 2 3 4 2 0 1 2 3 IC.2 IC.3 3 1 1 2 3 1 0 1 2 3 4 2 1 0 1 2 3 1 1 2 3 IC.4

Key references I Tyler, D. E., Critchley, F., Dümbgen, L. and Oja, H. (2008). Exploring multivariate data via multiple scatter matrices. Journal of the Royal Statistical Society, Series B, accepted. Oja, H., Sirkiä, S. and Eriksson, J. (2006). Scatter matrices and independent component analysis. Austrian Journal of Statistics, 19, 175 189. Nordhausen, K., Oja, H. and Tyler, D. E. (2008). Two scatter matrices for multivariate data analysis: The package ICS. Journal of Statistical Software, accepted. Nordhausen, K., Oja, H. and Ollila, E. (2008). Multivariate models and their first four moments. Submitted.

Key references II Nordhausen, K., Oja, H. and Ollila, E. (2008): Robust independent component analysis based on two scatter matrices. Austrian Journal of Statistics, 37, 91 100. Nordhausen, K., Oja, H. and Tyler, D. E. (2006): On the efficiency of invariant multivariate sign and rank tests. In Liski, E.P., Isotalo, J., Niemelä, J., Puntanen, S., and Styan, G.P.H. (editors) Festschrift for Tarmo Pukkila on his 60th birthday, 217 231, University of Tampere, Tampere. Kankainen, A., Taskinen, S. and Oja, H. (2007). Tests of multinormality based on location vectors and scatter matrices. Statistical Methods & Applications, 16, 357 379.