You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Exercise
Estimation of Henry's Constant from Chemical Structure

The data set HENRYSEM contains various descriptors and properties of 157 substances, among them the logarithm of Henry's coefficient, the boiling point, and the melting point. The following variables are available:

ln(H)           logarithm of Henry's constant
melt.p.         melting point (deg. Celsius)
boil.p.         boiling point (deg. Celsius)
DENS20          density at 20 deg. Celsius
nD20            refractive index at 20 deg. Celsius
Hv(LB)          enthalpy of evaporation
compact         topological index indicating the compactness of a molecule
rad             topological radius
dia             topological diameter
nvz             number of branches in the molecule
Randic          Randic index
RdOz            modified Randic index
NMethyl         number of methyl groups in molecule
TJ              topological index J (defined by Balaban)
C               number of carbon atoms
H               number of hydrogen atoms
O               number of oxygen atoms
N               number of nitrogen atoms
SumH            number of hetero (non-H, non-C) atoms in molecule
MWgt            molecular weight
LOIX            topological index reflecting electronegativities


Use this data and go to the  to model Henry's constant from the molecular descriptors. Try to compare several methods, ie. MLR (in combination with forward selection of variables), PCR, and ANN (RBF networks). Which of the models is "best"?

What about modeling the boiling points and the melting points by using this data set?

Do you have an explanation for the difference between boiling points and melting points?

Last Update: 2005-Jul-16