|You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.
Establishing ARIMA models
The process of finding appropriate ARIMA models
has been studied intensively. As a result, detailed guidelines exist. The
method described in [Box and Jenkins, 1970] is referred to as the "Box-Jenkins
1. Model Selection
In the model selection phase, a single model is chosen.
This requires determining the values p, d, and q of an ARIMA[p,d,q]-model.
In this phase, it is important to collect as much relevant information
on the time series as possible. The first steps involve the filtering of
and the removal of seasonal effects. The correlation function can be inspected
to reveal the best choice of d. Heuristics exist as to which filter to
use, depending on the shape of the correlation function: for instance,
when it is descending, alternatingly positive and negative, when it has
peaks, or when it is periodical. There exist heuristics guiding the selection
of the appropriate model for time series with p<= 2 and q <= 2. Surprisingly,
the majority of time series can be modeled very well with such simple models.
The auto-correlation function (ACF) and
the partial auto-correlation function (PACF) can be used for determining
p and q of the ARIMA[p,d,q]-models. They are determined for a limited number
of time lags t, e.g. 20. Then, confidence intervals
(e.g. 95% intervals) are calculated. The time lags
t lying outside the confidence intervals can be taken as p and q.
Those found outside the confidence interval around the ACF function indicate
that a MA[t] model should be used, and those
of the PACF function indicate that an AR[t]
model may be applicable.
2. Parameter Estimation
In order to estimate the time series value x(t) with
an ARIMA[p,d,q]-model, p, d, and q have to be selected first. The number
of differentiation steps d determines how often the original time series
is differentiated before the respective formula is applied. This procedure
is required for filtering trends.
When p, d, and q of an ARIMA model are given,
the parameters ai and bj
can be estimated. This is done by minimizing (some function of) the error.
This is the distance between the time series produced by the original time
series and the time series produced by the model. When d is used, i.e.
0<d, the errors for the d-th derivative of the time series are taken.
The "least squares approach" is the most common technique. It minimizes
the squared errors.
Depending on the overall task, other performance
measures may be formulated to measure the quality of the model. It is often
used as default, but other measures may be more reasonable for a given
3. Performance Checking
To check the performance, it is important to use
independent test sets consisting of time series which have not yet been
involved in the modeling process. The error on these independent test sets
is compared to that obtained with other models. Usually, the error is a
value obtained by applying some function on the difference between the
observed and the forecast value.
Box and Jenkins advise taking a look at the autocorrelation
functions of the time series and of the errors. If the latter contains
any suspicious peaks, the model does not exploit all the available information.
Moreover, it is reasonable to evaluate the performance of ARIMA models
of higher order: ARIMA[p+1,d,q] and ARIMA[p,d,q+1]. This shows whether
models of higher order improve the forecasts. If a model does not provide
better forecasts, the model of lower order is preferred, because it has
fewer parameters. In order to avoid under- and overdifferentiation, the
models with higher and lower d (ARIMA[p,d-1,q] and ARIMA[p,d+1,q]) should
also be tested. Finally, more complex models may be checked.
Last Update: 2004-Jul-03