Teach/Me Data Analysis

You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Table of Contents Appendix Exercises Henry's Constant	Index
See also: MLR, PCR, ANN, variable selection

Exercise
Estimation of Henry's Constant from Chemical Structure

The data set HENRYSEM contains various descriptors and properties of 157 substances, among them the logarithm of Henry's coefficient, the boiling point, and the melting point. The following variables are available:

ln(H) logarithm of Henry's constant melt.p. melting point (deg. Celsius) boil.p. boiling point (deg. Celsius) DENS20 density at 20 deg. Celsius nD20 refractive index at 20 deg. Celsius Hv(LB) enthalpy of evaporation compact topological index indicating the compactness of a molecule rad topological radius dia topological diameter nvz number of branches in the molecule Randic Randic index RdOz modified Randic index NMethyl number of methyl groups in molecule TJ topological index J (defined by Balaban) C number of carbon atoms H number of hydrogen atoms O number of oxygen atoms N number of nitrogen atoms SumH number of hetero (non-H, non-C) atoms in molecule MWgt molecular weight LOIX topological index reflecting electronegativities

Use this data and go to the to model Henry's constant from the molecular descriptors. Try to compare several methods, ie. MLR (in combination with forward selection of variables), PCR, and ANN (RBF networks). Which of the models is "best"?

What about modeling the boiling points and the melting points by using this data set?

Do you have an explanation for the difference between boiling points and melting points?

Last Update: 2005-Jul-16

ExerciseEstimation of Henry's Constant from Chemical Structure

Exercise
Estimation of Henry's Constant from Chemical Structure