You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

List of Data Sets

The following table contains a compilation of all data sets supplied with Teach/Me - Data Analysis. Most of the data sets are real data which have been obtained from various sources (see reference section at end of this page). A few data sets are simulated data sets which have been generated with a background story in mind. The file names of the simulated data sets are displayed in brown color.
Filename Description Ref.
ALCOHOL Subset of the data set WINE containing only the alcohol content of two brands. [7]
BENZ Spectroscopic data (NMR) on various brands of gasoline, and the relative octane number. [10]
BODYFAT Percentage of body fat, age, weight, height, and ten body circumference measurements (e.g., abdomen) are recorded for 252 men.  Body fat, a measure of health, is estimated through an underwater weighing technique.  Fitting body fat to the other measurements using multiple regression provides a convenient way of estimating body fat for men using only a scale and a measuring tape. [1]
BOILPTS Boiling points and topological descriptors of 185 chemical substances. [3,13]
CANCER Number of intestine cancer cases in West Germany in the period between 1955 and 1995 [24]
CIGART Artificial data set for classification, created by INSPECT. The data points are arranged in a way that only non-linear methods are able to classify the data correctly [2]
COINS Weight of 114 coins (Austrian 1 Schilling pieces) of different age. [5]
ETHANOL NOx concentration in the exhaust gases of an experimental ethanol motor.  [25]
EXMPL-A Artificial data set which shows a few simple relationships among variables. -
FISH1SPECIES Subset of data set FISHCATCH showing the relationship between length of weight of fish. [22]
FISHCATCH Body measurements of different species of perch. [22]
FLURIEDW This data set comprises geometric measures of 100 authentic and 100 counterfeit bank notes. [12]
FREEFALL Simulated data to show variability in data. A steelball is released at different heights; for each height the experiment is repeated 100 times. -
HENRYSEM Henry's constant of chemical substances together with molecular descriptors. The physical data has been obtained from [17], the molecular descriptors have been calculated using TOPIX [18] [17,18]
HUMIDIT2 Average Relative Humidity(%) of 264 places in USA. The data set contains the data of June and September, morning and afternoon each. In addition, the annual averages are in the last two columns.  [8]
IRIS Three types of iris plants. The plants are described by four variables. [14]
METHANE This data set contains the concentration of atmospheric methane measured monthly during the period from September 1980 to September 1988. [15]
MINWATER Chemical analysis of different brands of mineral water. [20]
MOTE9603 Climate data obtained from Mote weather station, Florida, USA. Data set contains measurements of 9 meteorological variables over a period of ten days in March 1996.  [4]
MOTETIDES Water level at the Mote weather station, Florida, USA, during July 1998. Data was obtained every 15 minutes. [4]
MULTIEST Artificial data used in an interactive example on multidimensional models. -
POLYFIT Artificial data showing a polynomial relationship of the third order. -
PRECIPITATION Normal monthly precipitation (Inches) in the period 1961-90.  [8]
RABBITS Fluctuations of a rabbit population [21]
REACTTEST The reaction times to visual stimuli were recorded for 9 persons. The experiment was repeated on two different days; one series was obtained before a two-hour lecture, the other series after a two-hour lecture. [9]
STRONTIUM Simulated data to show two-sample t-test. -
SUNSPOTS Average monthly sunspot areas between 1874 and 1998. [19]
TERPBIC Data set containing two classes of chemical substances described by two spectral parameters. This data set cannot be treated by linear methods. [11]
TRAIN Simulated data to show a skewed distribution.  -
TWOCLASS Artificial data set containing two classes of observations -
WINE Chemical analysis of three kinds of  Italian red wines (Barolo, Grignolino, Barbera). [7]
WINEGER Chemical analysis of various kinds of German wines. [23]
WORLDPOP Demographical, sociological and economical data on the world's nations (1988). [6]


References to the sources of the data sets:
[1] K. Penrose, A. Nelson, and A.G. Fisher, (1985),
Generalized Body Composition Prediction Equation for Men Using Simple Measurement Techniques
Medicine and Science in Sports and Exercise 17(2) (1985) 189
Data set by courtesy of  Garth Fisher
[2] H. Lohninger
INSPECT - A program system for scientific and engineering data analysis.
Springer, Berlin, Heidelberg, New York 1996
[3] H. Lohninger
Evaluation of Neural Networks Based on Radial Basis Functions and Their Application to the Prediction of Boiling Points from Structural -Parameters.
J. Chem. Inf. Comput. Sci. 33 (1993) 736-744 
[4] Mote Weather Station, Florida, USA
Data by courtesy of Don Hayward
Mote Marine Laboratory
1600 Ken Thompson Parkway
Sarasota, FL 34236, USA
[5] Coins have been collected and weighted by H. Lohninger and A. Satzinger, Vienna University of Technology, Vienna, Austria
[6] This data set has been compiled from a variety of public sources, including the United Nations (, the Worldbank (, and the CIA Factbook (
[7] M. Forina, E. Tiscornia
Ann. Chim. 72 (1982) 143
Data set courtesy of M. Forina, Università di Genova, Italy
[8] The data has been published by the National Climatic Data Center on their Web site:
[9] H.Lohninger
Reaction measurements to visual stimuli.
Vienna University of Technology, 1998
[10] R. Meusinger, R. Moros: 
Application of Genetic Algorithms and Neural Networks in Analysis of Multicomponent Mixtures by NMR-Spectroscopy, in J. Gasteiger (Ed.) "Software Development in Chemistry, 10", Gesellschaft Deutscher Chemiker, Frankfurt 1996, p. 209 
Data set courtesy of R. Meusinger.
[11] H. Lohninger
Data has been computed from mass spectral data by means of MSLIB
[12] B. Flury, H. Riedwyl
Angewandte multivariate Statistik
G.Fischer- Verlag, Stuttgart 1983
Data set by courtesy of H. Riedwyl, Bern, Switzerland
[13] A.T. Balaban, L.B. Kier, N. Joshi
Correlations between chemical structure and normal boiling points of acyclic ethers, peroxides, acetals and their sulfur analogues
J. Chem. Inf. Comput.Sci. 32 (1992) 237-244
[14] R.A. Fisher
The use of multiple measurements in taxonomic problems
Annual Eugenics 7 (1936), Part II, 179-188
[15] M.A.K. Khalil, R.A. Rasmussen
Atmospheric Methane: Recent Global Trends
Environ. Sci. Technol. 1990, 24, 549-553
[16] H. Lohninger
Estimation of Soil Partition Coefficients of Pesticides from their Chemical Structure
Chemosphere 29 (1994) 1611
[17] J. Hine, P.K. Mookerjee
The intrinsic hydrophilic character of organic compounds. Correlations in terms of structural contributions
J. Org. Chem. 40 (1975) 292-298
[18] D. Svozil, H. Lohninger
TOPIX - A program to calculate topological indices.
[19] Royal Greenwich Observatory/USAF/NOAA
NASA/Marshall Space Flight Center
Data set by courtesy of David H. Hathaway
[20] H. Lohninger
Data on different brands of mineral water has been collected by the author from the labels of the water bottles.
[21] Rabbits 
[22] P. Brofeldt
Bidrag till kaennedom on fiskbestondet i vaara sjoear. Laengelmaevesi. 
in T.H.Jaervi: Finlands Fiskeriet  Band 4, Meddelanden utgivna av fiskerifoereningen i Finland. 
Helsingfors 1917
[23] G. Thiel, K. Danzer
Direct analysis of mineral components in wine by inductively coupled plasma optical emission spectrometry (ICP-OES).
Fresenius J Anal Chem. 1997; 357: 553-557.
Data set courtesy of Klaus Danzer, Friedrich-Schiller Universität Jena, Germany
[24] N.Becker, J. Wahrendorf
Atlas of cancer mortality in the Federal Republic of Germany.
Springer Berlin Heidelberg 1998
[25] N.D. Brinkman
Ethanol Fuel--A Single-cylinder engine study of efficiency an exhaust emissions
SAE Transactions 90 (1981), No. 810345, 1410-1424.


Last Update: 2005-Jšn-25