You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information. |
Table of Contents Multivariate Data Modeling Classification and Discrimination Linear Discriminant Analysis | |
See also: classification and discrimination, multiple linear regression - introduction |
Linear Discriminant Analysis (LDA) is a method to discriminate between
two or more groups of samples. In order to develop a classifier based on
LDA, you have to perform the following steps:
definition of groups |
definition of discriminating function |
estimation of discriminating function |
test of discriminating function |
application |
Definition of groups:
The groups to be discriminated can be defined either naturally by the
problem under investigation, or by some preceding analysis, such as a cluster analysis. The number of groups is not restricted to two, although the
discrimination between two groups is the most common approach. Note that
the number of groups must not exceed the number of variables describing
the data set. Another prerequisite is that the groups have the same covariance
structure (i.e. they must be comparable).
Definition of discriminating function:
In principle, any mathematical function may be used as a discriminating function. In case of the LDA, a linear function of the form
y = a_{0} + a_{1}x_{1} + a_{2}x_{2} + ..... + a_{n}x_{n}
is used, with x_{i} being the variables describing the data set. The parameters a_{i}have to be determined in such a way that the discrimination between the groups is best. Note that this linear discriminating function is formally equivalent to the multiple linear regression. In fact, one can directly use MLR if the response variable y is replaced by the weighted class numbers c_{1} and c_{2}:
c_{1} = n_{2}/(n_{1}+n_{2}) and c_{2} = - n_{1}/(n_{1}+n_{2})
In order to get a better understanding of the working of the discriminating
function, start the following .
As you have seen in the interactive example above, there is only one
direction of the discriminating line which yields the best separation results.
The determination of the coefficients of the discriminating function is
quite simple. In principle, the discriminating function is formed in such
a way that the separation (=distance) between the groups is maximized,
and the distance within the groups is minimized.
Test of the discriminating function
When the discriminating function is parametrized, it has to be tested
either by using an independent set of test data, or by performing cross-validation.
In both cases, the results of the test set should be comparable to the
training data.
Application
Discriminant analysis can be used to perform either analysis or classification:
Last Update: 2006-Jän-17