Publication date: 07/08/2024

Statistical Details for Distance Methods

This section provides the formulas used in calculating distances based on the Method that you select in the Hierarchical Cluster launch window. For a description of the methods, see Method.

The formulas use the following notation, where lowercase symbols generally pertain to observations and uppercase symbols to clusters:

n is the number of observations

v is the number of variables

xi is the ith observation

CK is the Kth cluster, subset of {1, 2,..., n}

NK is the number of observations in CK

Equation shown here is the sample mean vector

Equation shown here is the mean vector for cluster CK

Equation shown here is the square root of the sum of the squares of the elements of x (the Euclidean length of the vector x)

d(xi, xj) isEquation shown here

Average Linkage

The distance for the average linkage cluster method is:

Equation shown here

Centroid Method

The distance for the centroid method of clustering is:

Equation shown here

Ward’s

The distance for Ward’s method is:

Equation shown here

Single Linkage

The distance for the single linkage cluster method is:

Equation shown here

Complete Linkage

Distance for the Complete linkage cluster method is:

Equation shown here

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).