|You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.|
|Table of Contents Multivariate Data Basic Knowledge Validation of Models Cross-Validation|
|See also: PRESS, validation of models|
While there are several different flavors of cross-validation, the fundamental idea stays the same: the model data is split into two mutally exclusive sets, a larger one (the 'training' set) and a smaller one (the 'test' set). The larger data set is used to set up the model, while the smaller data set is used to validate the model, i.e. the model is applied to the smaller data set and the results are compared to the expected values (as defined in the smaller data set). This process is then repeated with different subsets, until each object of the data set is used once for the test set.
The size of the test set for each repetition of the procedure can be adjusted to the user's needs, and mainly depends on the size of the entire data set and the amount of time and effort used to perform the cross-validation. There are two conceivable extreme cases: (1) splitting the data set into two equal halves, and (2) selecting only a single object for the test set. The latter approach is also called full cross-validation, and is in general the favourable approach.
In order to measure the performance of the model, one should calculate
the PRESS value.
Last Update: 2006-Jšn-17