In the K Means Cluster platform, each K Means NCluster report gives the following summary statistics for each cluster:
• The Cluster Summary report gives the number of clusters and the observations in each cluster, as well as the number of iterations required.
• The Cluster Means report gives means for the observations in each cluster for each variable.
• The Cluster Standard Deviations report gives standard deviations for the observations in each cluster for each variable.
Each K Means NCluster report contains the following red triangle menu options:
Biplot
Shows a plot of the points and clusters in the first two principal components of the data, along with a legend identifying the cluster colors. Circles are drawn around the cluster centers and the size of the circles is proportional to the count inside the cluster. The shaded area is the density contour around the mean. By default, this area indicates where 90% of the observations in that cluster would fall (Mardia et al. 1980). Use the list below the plot to change the plot axes to other principal components. Alternatively, use the arrow button to cycle through all possible axes combinations. An option to save the cluster colors to the data table is also located below the plot. See Save Colors to Table. The eigenvalues are shown in decreasing order.
Note: If Columns Scaled Individually is checked in the launch window, the biplot uses a correlation matrix. If Columns Scaled Individually is not checked, the biplot uses a covariance matrix.
Biplot Options
Contains the following options for controlling the appearance of the Biplot:
Show Biplot Rays
Shows the biplot rays. The labeled rays show the directions of the covariates in the subspace defined by the principal components. They represent the degree of association of each variable with each principal component.
Biplot Ray Position
Enables you to specify the position and radius scaling of the biplot rays. By default, the rays emanate from the point (0,0). In the plot, you can drag the rays or use this option to specify coordinates. You can also adjust the scaling of the rays to make them more visible with the radius scaling option.
Biplot Contour Density
Enables you to specify the confidence level for the density contours. The default confidence level is 90%.
Mark Clusters
Assigns markers that identify the clusters to the rows of the data table.
Biplot 3D
Shows a three-dimensional biplot of the data. Available only when there are three or more variables.
Parallel Coord Plots
Creates a parallel coordinate plot for each cluster. The plot report has options for showing and hiding the data and means. See “Parallel Plots” in Essential Graphing.
Scatterplot Matrix
Shows or hides a scatterplot matrix using all of the Y variables. Each scatterplot contains density ellipses based on the current number of clusters.
SOM Heat Map
(Available only for Self Organizing Maps.) Shows or hides a heat map of the SOM cluster means, colored by one of the Y variables that was used in the clustering. Use the menu next to Select column to color heat map to change the Y variable.
Note: The clusters on the heat map are organized in a top down, right to left layout. This means that the first cluster is in the top right corner and that last cluster is in the bottom left corner.
Save Colors to Table
Assigns colors that identify the clusters to the rows of the data table. If there is a Biplot in the report window, the colors saved to the data table match the colors of the clusters in the Biplot. If the colors are changed in the Biplot and the Save Colors To Table option is selected again, the colors in the table update to match those in the Biplot.
Note: When any of the Save options are selected, each saved column contains a Notes column property that specifies the number of clusters for that particular column’s data. This enables you to save columns from more than one cluster fit and use the column property to identify which clustering fit the saved column is from.
Save Clusters
Saves the following two columns to the data table:
– The Cluster column contains the number of the cluster to which the given row is assigned.
– (Not available for Self Organizing Maps.) The Distance column contains the squared Euclidean distance between the given observation and its cluster mean. For each variable, the difference between the observation’s value and the cluster mean on that variable is divided by the overall standard deviation for the variable. These scaled differences are squared and summed across the variables.
Save Cluster Distance
(Not available for Self Organizing Maps.) Saves a Distance column to the data table. This column is the same as the Distance column obtained from the Save Clusters option.
Save SOM Grid
(Available only for Self Organizing Maps.) Saves new columns to the data table. The new columns contain the SOM grid row and column numbers for the most likely cluster for each observation.
Save Cluster Formula
Saves a formula column called Cluster Formula to the data table. This is the formula that identifies cluster membership for each.
Save Distance Formula
(Not available for Self Organizing Maps.) Saves a formula column called Distance Formula to the data table. This is the formula that calculates the distance to the assigned cluster.
Save K Cluster Distances
(Not available for Self Organizing Maps.) Saves k columns containing the squared Euclidean distances to each cluster center.
Save K Distance Formulas
(Not available for Self Organizing Maps.) Saves k columns containing the formulas for the squared Euclidean distances to each cluster center.
Publish Cluster Formulas
Publishes to the Formula Depot the same scoring code used in the Save Cluster Formula option.
Simulate Clusters
Creates a new data table containing simulated cluster observations on the Y variables, using the cluster means and standard deviations.
Remove
Removes the clustering report.