Multivariate Methods > Hierarchical Cluster > Hierarchical Cluster Platform Options
Publication date: 07/08/2024

Hierarchical Cluster Platform Options

The Hierarchical Clustering red triangle menu contains the following options:

Color Clusters

Colors the labels for dendrogram and their associated join bars according to cluster membership. Also assigns the corresponding colors to the rows of the data table The colors update if you change the number of clusters. If you deselect this option, the colors are no longer updated based on the number of clusters.

Mark Clusters

Assigns markers to the rows of the data table corresponding to the cluster to which the row belongs. The markers update if you change the number of clusters. If you deselect this option, the markers are no longer updated based on the number of clusters.

Number of Clusters

Specifies the number of row clusters and positions the dendrogram slider to that number.

Cluster Criterion

(Not available when Data is distance matrix is selected as the Data Format.) Shows or hides the Cubic Clustering Criterion (CCC) table. The CCC is shown for clusters 1 through about 1/10th the number of observations in the data table. The CCC can be used to estimate the number of clusters in certain scenarios. In general, larger values of the CCC indicate a better fit. However, fewer numbers of clusters can be more interpretable. For more specific guidelines on how to interpret the CCC, see SAS Institute Inc. (1983). This criterion can be used with any distance-based clustering algorithm.

Caution: If clusters are elongated or irregularly shaped, the Cubic Clustering Criterion should not be used as a clustering criterion.

Show Dendrogram

Shows or hides the Dendrogram report.

Dendrogram Scale

Contains the following options for scaling the dendrogram:

Distance Scale

Shows the horizontal distances between any two join points as the distances between the two clusters joined at that point, based on the distance method specified on the launch window. The distance scale is the same scale as used in the Distance Graph and is the default scale for the dendrogram.

Even Spacing

Shows the horizontal distances between any two join points as equal.

Geometric Spacing

Increases the horizontal distances between join points as the number of clusters increases. This option is useful when there are many objects and you want the smaller clusters to be more visible than the larger clusters.

Distance Graph

Shows or hides the distance plot beneath the dendrogram.

Show NCluster Handle

Shows or hides the handles on the dendrogram used to manually change the number of clusters.

Zoom to Selected Rows

Selects and enlarges a particular cluster after you select the cluster in the dendrogram. Alternatively, you can double-click the cluster to zoom in on it. Use Release Zoom to return to the original view.

Release Zoom

Returns the dendrogram to the original view after zooming.

Pivot on Selected Cluster

Reverses the order of the two sub-clusters of the currently selected cluster.

Positioning

Provides options for changing the positions of labels and other parts of the dendrogram.

Color Map

Enables you to add a color map, or heat map, showing each Y, Column variable colored by value. Several color theme choices are available in a submenu. To remove a color map, select Color Map > None.

More Color Map Columns

(Available only when Data as usual is selected as the Data Format.) Adds a color map for specified columns.

Legend

Shows or hides a legend for the colors used in color maps. There is a separate legend for each of the specified columns. This option is available only if a color map is enabled.

Note: If there are more than 400 columns, a single legend is shown with a standardized score for the colors used in the color maps.

Two Way Clustering

(Available only when Data as usual or Data as summarized is selected as the Data Format.) Clusters by both the specified columns and the rows. A color map is added to the dendrogram with a dendrogram for the Y variables at its base. Typically, for two-way clustering, your variables are measured on the same scale and you do not standardize the data.

Column Clustering

(Available only when Two Way Clustering is used.) Provides option for clustering the columns in two way clustering.

Number of Column Clusters

Specifies the number of column clusters.

Column Cluster Criterion

Shows or hides the Cubic Clustering Criterion (CCC) table for the entire range of number of column clusters. The CCC is used to estimate the number of clusters. It can be used with any distance-based clustering algorithm. Larger values of the CCC indicate better fit in terms of number of clusters. See SAS Institute Inc. (1983).

Save Column Clusters

Saves a new data table that contains cluster membership information for the columns.

Save Clusters

Saves a new data table that contains cluster membership information. If Add Spatial Measures is selected on the launch window, the cluster numbers are also saved to the Hough Data Table.

Save Cluster Means

Creates a new data table that contains the number of rows and the means of each column in each cluster.

Save Other

Shows a submenu of additional save options.

Save Formula for Closest Cluster

Creates a data table column that contains a formula for the closest cluster. This option calculates the squared Euclidean distance to each cluster’s centroid and selects the cluster that is closest. Note that this formula does not always reproduce the cluster assignment given by Hierarchical Clustering since the clusters are determined differently. However, the cluster assignment is very similar. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Save Cluster History

Creates a new data table that contains the information in the Clustering History report.

Save Display Order

Creates a data table column that contains the order in which the row appears in the dendrogram.

Save Distance Matrix

Creates a new data table that contains the distances between the observations.

Save Constellation Coordinates

Saves the coordinates of the constellation plot to the data table. (Not available when Data as summarized, Data is distance matrix, or Data is stacked is selected.)

Save Cluster Hierarchy

Creates a data table that contains the information needed to write a script for a custom dendrogram. For each cluster join, there are three rows: the first for the joiner, the second for the leader, and the third for the result, giving the cluster centers, size, and other information.

Save Cluster Tree

Creates a new data table that contains information needed to compare cluster trees between JMP and SAS. For each cluster join, there is one row for each new cluster, with the cluster’s size and other information.

Clustering History

Shows or hides the Clustering History report. See Clustering History.

Cluster Summary

(Not available when Data is distance matrix is selected.) Shows or hides a report that contains the following information:

Cluster Means

A table that gives, for each cluster, the number of observations (or Object IDs, if the data are stacked) and means for each variable.

Cluster Standard Deviations

A table that gives, for each cluster, the number of observations (or Object IDs, if the data are stacked) and standard deviations for each variable.

Cluster Means Plot

Either a parallel plot or a two-dimensional heat map of the cluster means.

The plot is a parallel plot unless Data is stacked is selected and there are two Attribute ID variables. For the parallel plot, the axis for each variable is scaled.

If Columns is selected for the Standardize By option, the axis ranges from two standard deviations above and below the mean, where the standard deviation and mean are computed for the raw data. If a cluster mean falls beyond this range, the axis is extended to include it.

If anything other than Columns is selected for the Standardize By option, there is a common vertical axis whose scaling is displayed. (The scaling is equivalent to the Scale Uniformly option in Graph Builder).

When Data is stacked is selected and there are two Attribute ID variables, two-dimensional plots of the mean of the Y variable at each location are shown for each cluster. These plots are colored using a Blue to Gray to Red color gradient.

Column Summary

For each variable, gives the RSquare value that represents the proportion of variation explained by the clusters. This number is the RSquare value for a regression of the variable on the clusters. The option also gives a bar chart of RSquare values.

Late Join Outliers

Shows or hides a table that contains observations that were clustered very late in the algorithm. The observations in this table were still clusters of one when the algorithm was 80% complete. Since each of these observations remained as it’s own cluster until late in the algorithm, these observations are potential outliers in the data set.

Constellation Plot

Shows or hides an alternative way to present the information in the hierarchical clustering dendrogram. Each observation (row) is represented by an endpoint and each cluster join is represented by a new point. The lines that are drawn represent cluster membership. The lengths of the lines represent the distance between clusters. Longer lines represent greater distances between clusters.

You can hover over the lines in the constellation plot to see their length. However, the length values are meaningful only with respect to each other. The axis scaling, orientation of points, and angles of the lines are arbitrary. They are determined such that the ends of the nodes are spaced out and the plot does not appear cluttered, which is important with larger data sets.

To turn off the labels on the endpoints, right-click inside the Constellation Plot and deselect Show Labels.

Scatterplot Matrix

(Available only when Data as usual is selected as the Data Format.) Creates a scatterplot matrix using all the variables.

Parallel Cord Plots

(Available only when Data as usual is selected as the Data Format.) Creates a parallel coordinate plot for each cluster. The axes are scaled as described for the Cluster Means Plot. See Cluster Means Plot.

Cluster Treatment Comparisons

(Available only if you hold Shift and click the Hierarchical Clustering red triangle.) Select a response column and a two-level treatment column. Creates a Hierarchically Clustered Differences report.

See Local Data Filters in JMP Reports, Redo Menus in JMP Reports, Group Platform, and Save Script Menus in JMP Reports in Using JMP for more information about the following options:

Local Data Filter

Shows or hides the local data filter that enables you to filter the data used in a specific report.

Redo

Contains options that enable you to repeat or relaunch the analysis. In platforms that support the feature, the Automatic Recalc option immediately reflects the changes that you make to the data table in the corresponding report window.

Platform Preferences

Contains options that enable you to view the current platform preferences or update the platform preferences to match the settings in the current JMP report.

Save Script

Contains options that enable you to save a script that reproduces the report to several destinations.

Save By-Group Script

Contains options that enable you to save a script that reproduces the platform report for all levels of a By variable to several destinations. Available only when a By variable is specified in the launch window.

Note: Additional options for this platform are available through scripting. Open the Scripting Index under the Help menu. In the Scripting Index, you can also find examples for scripting the options that are described in this section.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).