Overview of the K Means Cluster Platform

K Means Cluster is one of four platforms that JMP provides for clustering observations. For a comparison of all four methods, see Overview of Platforms for Clustering Observations.

The K Means Cluster platform forms a specified number of clusters using an iterative fitting process. The k-means algorithm first selects a set of k points called cluster seeds as an initial guess for the means of the clusters. Each observation is assigned to the nearest cluster seed to form a set of temporary clusters. The seeds are then replaced by the cluster means, the points are reassigned, and the process continues until no further changes occur in the clusters.

The k-means algorithm is a special case of the EM algorithm, where E stands for Expectation, and M stands for maximization. In the case of the k-means algorithm, the calculation of temporary cluster means represents the Expectation step, and the assignment of points to the closest clusters represents the Maximization step.

K-Means clustering supports only numeric columns. K-Means clustering ignores modeling types (nominal and ordinal) and treats all numeric columns as continuous.

You must specify the number of clusters, k, or a range of values for k, in advance. However, you can compare the results of different values of k to select an optimal number of clusters for your data.

For background on K-Means clustering, see the FASTCLUS Procedure chapter in the SAS/STAT 14.3 User’s Guide (SAS Institute Inc. 2017c) and Hastie et al. (2009).