You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Modeling

In many cases, we often suspect some relationships among the data when acquiring the data. However, in order to make more precise statements, draw conclusions, or predict from the measured data, we have to set up a model which represents the nature of the underlying relationship.

Models can either be based on some theoretical laws or principles (such as the relationship between a measured spectrum and the concentration of the analyte) or can be empirical without any explicitly known relationship (such as the toxicity of some chemical substances in relation to their chemical structure). The variables which form the basis (input) of the model are called predictor variables, the variable which is to be estimated by the model is called the response variable.

Another aspect to discriminate between models is their (non)linearity. Depending on the circumstances we may either try to linearize non-linear models, or apply non-linear models. Using non-linear models generally requires much more caution than linear models, since non-linear models are much more likely to adapt to noise in the data than linear models.

A third aspect is the type of the dependent (response) variable, which may be either qualitative or quantitative. Qualitative variables will result in classification models, quantitative variables will result in calibration models. In general, there are several terms which have been developed historically to describe some aspects of a model:
 
additive model Additive models are linear models. The predictor variables show an additive effect on the response variable.
biased model Biased models are models which are based on estimators which show a non-zero difference between the expected value of the estimator and its corresponding true value (biased estimator).
causal model Models with a causal relationship between predictor and response variables.
deterministic model Does not contain any random parts (cf. to stochastic model). 
linear model Linear models are models which are linear in their parameters. Linear models must not necessarily estimate plane relationships. 
non-linear model A non-linear model is non-linear in the parameters to be estimated (see also the discussion about linearity for more details).
parsimonious model A parsimonious model is  a model with as few parameters as possible for a given quality of a model.
soft model A soft model relies on intermediary (latent) variables, which are often formed by eigenanalysis of the data. A predictor variable of soft models cannot be assigned to a single measured variables, but is rather some combination of several measured variables.
stochastic model A stochastic model contains random elements (in contrast to the deterministic model)

Methods for modeling cover a wide range. The following is a short list of the more important ones:

linear regression
LISREL (linear structural relationship)
MARS
multiple regression
neural networks
partial least squares (PLS)
principal component regression (PCR)
ridge regression
SIMCA

Last Update: 2006-Jšn-17