Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Table of Contents Multivariate Data Basic Knowledge Scatter, Covariance, and Correlation Matrix	Index
See also: matrix algebra, covariance, correlation

Scatter, Covariance, and Correlation Matrix

These three types of matrices often form the basis of a multivariate method. The correlation and the covariance matrix are also often used for a first inspection of relationships among the variables of a multivariate data set. Therefore it is crucial to understand the principles behind them and the pitfalls which may arise from not-as-expected data sets.

How are these matrices related to each other?

Basically, all of these matrices are calculated using the same procedure: A^TA. The only difference between them is how the data is scaled before the matrix multiplication is executed:

scatter matrix: no scaling
covariance: mean of each variable is subtracted before multiplication
cross correlation: each variable is standardized (mean subtracted, then divided by standard deviation)

What is the effect of a single outlier on these matrices?

Suppose you have a data matrix which contains one object which is an outlier compared to the rest of the data. This single outlier will completely "corrupt" the matrices (especially the cross correlation matrix), showing a fake correlation. This fake correlation can misguide any unprepared operator. You may try this effect yourself by running the following .

Be extremly careful when selecting variables by looking at the cross correlation table. A high correlation value may be due to a single outlier in the data matrix.

Last Update: 2006-Jän-17