Launch the Principal Components platform by selecting Analyze > Multivariate Methods > Principal Components. Principal Component analysis is also available using the Multivariate and the Scatterplot 3D platforms.
The example described in Example of Principal Component Analysis uses all of the continuous variables from the Solubility.jmp sample data table.
Figure 4.3 Principal Components Launch Window
For more information about the options in the Select Columns red triangle menu, see Column Filter Menu in Using JMP.
Y, Columns
The variables to analyze for components.
Z, Supplementary Variable
The supplementary variables to be displayed. Supplementary variables are not included in the calculation of principal components and including them does not affect the results. Supplementary variables that are continuous can be projected on to the loading plot and used to enhance interpretation.
Weight
Identifies one column whose numeric values assign a weight to each row in the analysis.
Note: The Weight role is ignored for the Wide and Sparse variance estimation methods.
Freq
Identifies one column whose numeric values assign a frequency to each row in the analysis.
Note: The Freq role is ignored for the Wide and Sparse variance estimation methods.
By
Creates a Principal Component report for each value specified by the By column so that you can perform separate analyses for each group.
Standardize
Specifies whether each column is centered and standardized. This determines what matrix is used to calculate the principal components.
Standardized
Centers and standardizes each column individually. The principal components are calculated based on the correlation matrix.
Unscaled
Centers each column individually. The principal components are calculated based on the covariance matrix.
Unscaled and Uncentered
The principal components are calculated based on the unscaled and uncentered matrix.
Method Family
Specifies the type of data.
Default
If the number of columns is less than 500 or less than the number of rows, Narrow Data is specified as the Method Family. If the number of columns is greater than 500 and greater than the number of rows, a wide estimation method is recommended by a JMP alert window. Click Wide Method (fast) to use to a wide data estimation method or click Default Method (slow) to use a narrow data estimation method.
Narrow Data
Uses the covariance matrix, correlation matrix, or unscaled and uncentered matrix to obtain the principal components.
Wide Data
Uses singular value decomposition to obtain the principal components.
Variance Estimation
(Available only when Narrow Data is specified as the Method Family.) Specifies the method for calculating the correlations. Several of these methods address the treatment of missing data.
Default
The Default option uses either the Row-wise, Pairwise, or REML methods. A JMP Alert also recommends switching to the Wide method when appropriate.
• Row-wise estimation is used for data tables with no missing values.
• Pairwise estimation is used for data tables with missing values and either more than 10 columns, more than 5,000 rows, or more columns than rows.
• REML estimation is used otherwise.
REML
Restricted maximum likelihood (REML) estimation uses all of the data, even if missing values are present. Due to a bias-correction factor, this method is slow if the dataset is large and there are many missing values. Therefore, REML is most useful for smaller datasets. If there are no missing cells in the data, then the REML and ML estimates are equivalent and equal to the sample covariance matrix. If there are missing cells, REML’s variance and covariance estimates are less biased than the estimates from ML estimation. For statistical details, see REML.
ML
Maximum likelihood (ML) estimation uses all of the data, even if missing values are present. Because the estimates from ML are generated quickly, this method is most useful for large data tables with missing data.
Robust
Robust estimation uses all of the data, even if missing values are present. This method down-weights extreme values and is therefore useful for data tables that might have outliers. For statistical details, see Robust.
Row-wise
Row-wise estimation calculates the Pearson correlation for each pair of columns. For statistical details, see Statistical Details for the Pearson Product-Moment Correlation. Row-wise estimation does not use observations with missing values. This method is useful for excluding any observations that have missing data.
Pair-wise
Pair-wise estimation uses all of the data, even if missing values are present. This variance estimation method calculates Pearson correlations for each pair of columns using all observations with nonmissing values for those two columns. For statistical details, see Statistical Details for the Pearson Product-Moment Correlation. Pair-wise estimation is most useful when a data table has missing values and either more columns than rows, more than 10 columns, or more than 5,000 rows.
• If you select REML, ML, or Robust and your data table contains more columns than rows and has missing values, JMP switches the Variance Estimation to Pair-wise.
• If you select Robust and your data table contains more columns than rows and does not have missing values, JMP switches the Variance Estimation to Row-wise.
• If your data table has more than 500 columns and more columns than rows, JMP switches the Variance Estimation to Wide no matter which method was originally selected.
Note: A wide estimation method is recommended by a JMP Alert window for data tables with more than 500 columns and more columns than rows. This is because computation time can be considerable when you use the other methods with a large number of columns. Click Wide Method (fast) to switch to a wide estimation method or click Default Method (slow) to use the method you originally selected.
Number of Components
(Available only when Wide Data is specified as the Method Family.) Specifies the number of components to be estimated. Typically, the Number of Components is much smaller than the dimension of your data.
Specified
Estimates the specified number of components using the Truncated SVD estimation method. Truncated SVD estimation uses all of the data, even if missing values are present. This estimation method uses an algorithm based on the partial singular value decomposition, which computes only the first specified number of singular values and singular value vectors. The algorithm avoids calculating the covariance matrix, as well as unnecessary principal components and is therefore computationally efficient. It is useful when your data are sparse, meaning they contain many zeros, or when there are a large number of columns in the data. For statistical details, see Truncated SVD.
Note: This was previously known as the Sparse estimation method prior to JMP 17.
All
Estimates all of the components using the Full SVD estimation method. Full SVD estimation does not use observations with missing values, so rows that contain missing cells are excluded. This estimation method uses an algorithm based on the full singular value decomposition. The algorithm avoids calculating the covariance matrix and is therefore computationally efficient. It is useful when you have a very large number of columns in your data. For statistical details, see Full SVD.
Note: This was previously known as the Wide estimation method prior to JMP 17.
Missing Value Imputation
(Available only when Wide Data is specified as the Method Family.) Imputes missing values through matrix completion.
Special Methods
(Available only when Wide Data is specified as the Method Family and there is a specified number of components to estimate.) Provides additional methods for computing a specified number of components.
Fast Approximate
Estimates the specified number of components using Randomized Singular Value Decomposition. See Randomized SVD.
Robust PCA
Estimates the specified number of components using a sequence of singular value decompositions and thresholding steps to decompose the data matrix. This method is also used in the Explore Outliers platform. For more information on the Robust PCA method, see Robust PCA Outliers in Predictive and Specialized Modeling.
In the Principal Components platform, the way in which missing data are handled depends on the variance estimation method. You can also estimate missing values outside of the platform in the following ways:
• Use the Impute Missing Data option found under Multivariate Methods > Multivariate. See Impute Missing Data.
• Use the Multivariate Normal Imputation or Multivariate SVD Imputation utilities found in Analyze > Screening > Explore Missing Values. See Explore Missing Values in Predictive and Specialized Modeling.