Example of K Means Clustering

In this example, you use the K Means Cluster platform to cluster observations in a cytometry analysis.

1. Select Help > Sample Data Folder and open Cytometry.jmp

2. Select Analyze > Clustering > K Means Cluster.

3. Select CD3, CD8, CD4, and MCB and click Y, Columns.

4. Click OK.

5. Enter 3 next to Number of Clusters.

6. Enter 15 next to Range of Clusters (Optional).

Because the Range of Clusters is set to 15, the platform provides fits for 3 to 15 clusters. You can then determine your preferred number of clusters.

7. Click Go.

Figure 14.2 Cluster Comparison Report

Cluster Comparison Report

The Cluster Comparison report appears at the top of the report window. The best fit is determined by the highest CCC value. In this case, the best fit occurs when you fit 11 clusters.

8. Scroll to the K Means NCluster=11 report.

Figure 14.3 K Means NCluster=11 Report

K Means NCluster=11 Report

The Cluster Summary report shows the number of observations in each of the eleven clusters. The Cluster Means report shows the means of the four marker readings for each cluster.

9. Click the K Means NCluster=11 red triangle and select Parallel Coord Plots.

Figure 14.4 Parallel Coordinate Plots for Cytometry Data

Parallel Coordinate Plots for Cytometry Data

The Parallel Coordinate Plots display the structure of the observations in each cluster. Use these plots to see how the clusters differ. Clusters 4, 6, 7, 8, and 9 tend to have comparatively low CD8 values and high CD4 values. Cluster 1, on the other hand, has higher CD8 values and lower CD4 values.

10. Click the K Means NCluster=11 red triangle and select Biplot.

11. (Optional) Click the K Means NCluster=11 red triangle and deselect Biplot Options > Show Biplot Rays.

Figure 14.5 Biplot for Cytometry Data

Biplot for Cytometry Data

A legend that identifies the colors of the clusters is shown to the right of the plot. The clusters that appear to be most separated from the others based on their first two principal components are clusters 3, 10, and 11. This is supported by their parallel coordinate plots in Figure 14.4, which differ from the plots for the other clusters. Use the list below the plot to see the biplot for other combinations of principal components.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).