Predictive and Specialized Modeling > Explore Missing Values > The Explore Missing Values Report
Publication date: 07/08/2024

The Explore Missing Values Report

The Explore Missing Values report contains a Commands section, a Missing Columns report, and an Imputation report once an imputation method is selected. The Commands section includes several options for additional reports and imputation methods.

Commands

Imputation Report

Commands

Missing Value Report

Shows the Missing Columns report, which lists the name of each column and the number of missing values in that column. The Missing Columns report also contains the following options:

Show only columns with missing

Removes columns from the list that do not have missing values.

Close

Closes the Missing Columns report.

Select Rows

Selects the rows in the data table that contain missing values for the column(s) that you select in the Missing Columns report.

Exclude Rows

Applies the excluded row state for rows in the data table that contain missing values for the column(s) that you select in the Missing Columns report.

Color Cells

Colors the cells in the data table that contain missing values for the column(s) that you select in the Missing Columns report.

Color Rows

Colors the rows in the data table that contain missing values for the column(s) that you select in the Missing Columns report.

To remove the Missing Columns report, click the Close button.

Missing Value Clustering

Provides a hierarchical clustering analysis of the missing data. The report includes a plot and two dendrograms. The rows of the plot are defined by the missing data patterns; there is a row for each pattern. The columns correspond to the variables. Each red cell indicates a group of missing values for the column listed beneath the plot. Hover over a cell to see the list of values represented. Click in the plot to select missing data pattern rows. Vertical bars are displayed to indicate the selected patterns.

The dendrogram to the right of the plot shows clusters of missing data pattern rows. These are the rows that you would obtain by using Tables > Missing Data Pattern.

The dendrogram beneath the plot shows clusters of variables.

Use this report to determine whether certain groups of columns tend to have similar patterns of missing values. To remove the Missing Value Clustering report, click the Close button.

Missing Value Snapshot

Shows a cell plot for the missing values. The columns represent the variables. Black cells indicate a missing value. This plot is especially useful in understanding missingness for longitudinal data, where subjects can withdraw from a study before the end of the data collection period. To remove the Missing Value Snapshot report, click the Close button.

Multivariate Normal Imputation

(Available only when variables have a Continuous modeling type.) Imputes missing values using least squares predictions from the nonmissing columns. Use the shrinkage option to improve estimation of the covariance matrix.

Caution: Avoid this method when there are hundreds of columns.

Multivariate SVD Imputation

(Available only when variables have a Continuous modeling type.) Imputes missing values quickly for large problems using an iterated low-rank SVD matrix completion method. When you click Multivariate SVD Imputation, the Imputation Method window shows the recommended settings, which can be adjusted.

Number of Singular Vectors

Number of singular vectors that are computed and used in the imputation.

Note: It is important not to specify too many singular vectors, otherwise the SVD and the imputations do not change from iteration to iteration.

Maximum Iterations

The number of iterations used in imputing the missing values.

Show Iteration Log

Opens a Details report that shows the number of iterations and gives details about the criteria.

For large problems, a progress bar shows how many dimensions the SVD has completed. You can stop the imputation and use that number of dimensions at any time.

Multivariate RPCA Imputation

(Available only when variables have a Continuous modeling type.) Imputes missing values using robust principal components, which replaces missing values using a low-rank matrix factorization that is robust to outliers.

Tip: This method is useful for wide problems.

Automated Data Imputation

Imputes missing values using a low-rank matrix approximation method. This method automatically selects the best dimension for the low-rank approximation based on the data. Before selecting this method, you can specify options for saving the imputed values and other advanced controls.

Create New Data Table

Creates a new data table that has the same dimensions as the original data table. In the new data table, the columns selected in the launch window contain the imputed values.

Save Scoring Formula to Current Data Table

Saves a column group, named Imputed_, to the current data table that contains the imputed columns specified in the launch window. A hidden column, ADI Impute Column, is also added to the current data table that contains the imputed vectors and the scoring formula used in the data imputation. The column formulas automatically update if any additional rows are added to the data table, enabling missing data imputation for streaming data. This is the default option.

Impute Values in Place

Imputes the missing values in the current data table. The imputed values are displayed in light blue.

Include informative missing columns

(Available only when Save Scoring Formula to Current Data Table is selected.) Adds an additional indicator column to the Imputed_ column group for each imputed column specified in the launch. This column specifies if each row is missing or not missing for each of the imputed Y columns.

Dimension Upper Bound

Determines the maximum rank allowed in the low-rank approximation. The default value is determined by the dimension of the matrix formed by the chosen columns.

Maximum Iterations

Determines the number of values that are iterated over to determine the tuning parameter for the imputation model. The default is 10.

Proportion of Observations to Induce as Missing

Determines the proportion of IM values that are added to the training and validation sets. The default proportion for each set is 0.2.

Proportion of Rows to Use for Validation

Determines the proportion of rows to use in the training and validation sets. The default proportion for the validation set is 0.3.

Set Random Seed

Determines the random seed for ADI. Use this option to obtain reproducible results.

Tip: To run a missing value command across all levels of a By variable, press Ctrl and click the desired command button.

Imputation Report

If you select one of the imputation methods, an Imputation Report is also included in the Explore Missing Values report window. The imputation report explains the results of the selected imputation process. The following results are included, depending on the selected method:

The number of missing values that were replaced.

The selected imputation method and any details specific to the selected method.

The number of rows and columns that were affected.

(Only for Multivariate Normal Imputation.) The number of different missing value patterns that were found.

The color of the imputed values in the data table.

(Only for Multivariate RPCA Imputation.) A Details report shows the number of iterations, the rank of the matrix, the value of the convergence criterion, and the value of the largest absolute scaled residual.

Once the imputation is complete, the cells corresponding to imputed values in the data table are colored. If the Missing Columns report is open, it is updated to show no missing values.

Click Undo to undo the imputation and replace the imputed data with missing values.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).