Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

	Index
See also: samples and population, experimental design, random sampling

Representative Samples

Drawing representative samples can be quite demanding. An experimenter should always ask himself whether the drawn samples are representative of the population he is interested in. A few examples should clarify the situation:

If you have a truck load of iron ore and you are the chemist who is responsible for checking the quality of the ore, you have to make sure that the samples drawn from the truckload are representative of the whole amount of ore.
Suppose you are a researcher who is interested in getting information on car usage in several countries, worldwide. While drawing a random sample of the French population by selecting people at random from telephone books might work (virtually all inhabitants of France have a telephone), this is certainly not the case when you select your sample the same way for Bangladesh.
The Literary Digest poll for the 1936 presidental elections in the United States predicted that Landon would defeat Roosevelt, which was disproved by reality. The vote in that election split along economic lines, with wealthier people favoring Landon and poorer people favoring Roosevelt. The samples for the investigation were taken from telephone books, which resulted in a non-representative sample (in 1936, telephone subscribers tended to be wealthier than the general population, and thus the sampling procedure oversampled Landon voters and undersampled Roosevelt voters).

One prerequisite for a representative sample is that the sampling process is done randomly. An example may clarify this:

A gardener changed the method of cultivation of tulips. In order to know whether the new method was successful, some statistical tests were performed. As the size of the population of tulips (= all available tulips) was approx. 4000, she decided to draw a selection of 100 flowers to calculate an estimate of the average length of the new cultivation population.

How could she select 100 out of about 4000 flowers, without distorting the measurement by subjective influences? Note: sampling by personal "standards" almost always causes errors due to psychological reasons. Maybe she was convinced of the new method, or rejected it for some reason. Even if she tried to be objective, it is questionable whether an unconscious manipulation of the sampling occurred anyway.

A usual method for creating representative samples is to use random numbers for the selection of individual test objects:

Assign consecutive numbers to each object
Calculate as many random numbers as the size of the sample requires. If you don't have a reliable random number generator, use random numbers from a table.
Pick the objects with the corresponding numbers.

Last Update: 2004-Jul-03