You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Agglomerative Clustering

Agglomerative clustering is based on the following principle: find the two objects which are closest to each other, merge them into a single new cluster, and repeat this process until all objects and clusters are merged into a single one. During the merging process it is necessary to record the distances of the merged objects in order to construct a dendrogram. The type of clustering can be influenced by the parameters of the Lance-Williams equation:

dqi' = s dpi + t dqi + u dpq + v |dpi-dqp|

with

s,t,u, and v being the system parameters,
dpi, dqi, dpq the distances between the clusters (or objects), and
dqi' being the new distance between the new cluster q and all other objects i. dqi' replaces dqi during the merging process.


Listed below are the parameters of the most commonly used clustering techniques.
 

type of clustering s t u v comment
single linkage 0.5 0.5 0 -0.5 contracting
complete linkage 0.5 0.5 0 0.5 dilating
average linkage 0.5 0.5 0 0 compromise
median 0.5 0.5 -0.25 0 not monotonous
centroid np/n nq/n -npnq/n2 0 not monotonous
Ward (np+ni)/(n-ni) (nq+ni)/(n-ni) -ni/(n-ni) 0 "best" approach
flexible strategy a a 1-2a 0 parameter a determines behavior
n ... number of objects 
np ... number of objects in cluster p 
nq ... number of objects in cluster q 
ni ... number of objects in cluster i


Last Update: 2006-Jän-17