You are working with the text-only light edition of "H.Lohninger: Teach/Me Data Analysis, Springer-Verlag, Berlin-New York-Tokyo, 1999. ISBN 3-540-14743-8". Click here for further information.

Minimal Spanning Tree

All agglomerative clustering algorithms based on the Lance-Williams equation have the drawback that the full distance matrix has to be calculated during the cluster analysis. This can take considerable computing power and requires high amounts of memory. A remedy to this can be found in a graph-theoretical approach, which is called the minimal spanning tree. Unfortunately, the minimal spanning tree algorithm can be applied only to single linkage clustering. However, in the case of large data sets, this algorithm can be orders of magnitude faster.

A minimal spanning tree is a graph which meets the following requirements:

If edges reflect the distance between objects, then clusters can be detected by removing all edges which have a distance greater than dmax.

There is a simple algorithm described by Prim how to determine the minimal spanning tree. The resulting minimal spanning tree is equal to the results of a single linkage cluster analysis:

1. select an arbitrary object as the starting vertex of a partial graph M
2. for all objects which are not yet part of the graph M, find the nearest neighbor to any vertex of graph M
3. join this nearest neighbor to M
4. repeat with step 2 until all vertices are part of graph M

Last Update: 2006-Jšn-17