Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Table of Contents Multivariate Data Modeling PCA PCA	Index
See also: eigenvectors, PC loadings and scores, applications of PCA, factor analysis, Exercise - Detection of mixtures of two different wines by PCA, Exercise - Classification of unknown wine samples by PCA

Principal Component Analysis

The problem with multivariate data is that it cannot be displayed on 2-dimensional paper or computer screens. For more than two dimensions, we have to project the data onto a plane. This projection changes with its direction; or, in other words, the projected image changes if the data points are rotated in the n-dimensional space. One might now ask how to find a rotation of the data (or of the axes - which is quite the same) which displays a maximum of information in the projected image.

If we assume that information from the data can be gained only if the variation along an axis is a maximum, we have to find the directions of maximum variation in the data. In addition, these new axes should again be orthogonal to each other. In order to find the new axes, the direction of the maximum variation should be found first in order to take it for the first axis. Thereafter we use another axis which is normal to the first and rotate it around the first axis until the variation along the new axis is a maximum. Then we add a third axis, again orthogonal to the other two and in the direction of the maximum remaining variation, and so on. This procedure is repeated until all dimensions have been "used up".

The process described above is generally called principal component analysis (PCA) and results in a rotation of the coordinate system in such a way that the axes show a maximum of variation along their directions. This somewhat simplified picture can be mathematically condensed to a so-called eigenvalue problem. The eigenvectors of the covariance matrix constitute the principal components. The corresponding eigenvalues give a hint to how much "information" is contained in the individual components.

The following shows a three-dimensional data set and the corresponding principal components. Note that the principal components are orthogonal to each other, and the correlation between any two principal components is zero.

Last Update: 2006-Jän-17