Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Table of Contents Univariate Data Distributions Central Limit Theorem	Index
See also: distributions, Normal Distribution

Central Limit Theorem

Generally speaking, central limit theorems are a set of weak-convergence results in probability theory. Intuitively, they all express the fact that any sum of many independent identically distributed random variables will tend to be distributed according to a particular "attractor distribution". The most important and famous result is simply called the Central Limit Theorem which states that if the (independent) variables have a finite variance then the sum of these variables will show a normal distribution. Since many real processes yield distributions with finite variance, this explains the ubiquity of the normal distribution.

An example should clarify this: take a population with an unknown density distribution function (e.g. a bimodal distribution as shown at the left). Now select a random sample of N observations and calculate their mean. Repeat this procedure many times and plot the histogram of means. You will see that the resulting histogram resembles a normal distribution (see bottom left). Now change the underlying distribution and repeat the whole experiment - again you will see normally distributed means. In order to perform further experiments click the image at the left.

This simulation shows the consequences of the central limit theorem, which is considered to be one of the most important results in statistical theory:

If you take a random sample of n observations from any population, then - if n is sufficiently large - the distribution of the means will be approximately normal, with a mean equal to the mean of the population, and a standard deviation equal to of the standard deviation of the population.

The minimum size of a random sample for obtaining normally distributed means depends on the distribution function of the population. In general, n has to be larger for highly skewed distribution functions. For n greater than 30 the sampled population will be normally distributed for most distribution functions.

Hint: A common trick to numerically create a normally distributed random variable is to draw 16 numbers of a uniform symmetric distribution and divide the mean by 4. This trick is based on the consequences of the central limit theorem.

Last Update: 2005-Aug-29