Launch the Make Validation Column platform by selecting Analyze > Predictive Modeling > Make Validation Column.
Figure 12.3 Make Validation Column Launch Window
For more information about the options in the Select Columns red triangle menu, see Column Filter Menu in Using JMP.
The Make Validation Column launch window provides the following options:
Stratification Columns
Assigns one or more stratification columns.
Grouping Columns
Assigns one or more categorical grouping columns.
Cutpoint Column
Assigns a numeric cutpoint column.
Cutpoint Batch ID
When a cutpoint column is assigned, you can also assign a column for cutpoint batch IDs. This enables you to determine cutpoint values within each level of the Cutpoint Batch ID column.
Provides three methods for validation.
Make Validation Column
creates a validation column based on the specified stratification, grouping, and cutpoint columns. The validation column method determined by the specified stratification, grouping, and cutpoint columns is described below the box. After a method is selected and you click OK, you specify the allocations for each set in the Make Validation Column report. See Specify Rates or Relative Rates and Set Cutpoints. There are five methods for constructing the holdback sets. All of these methods, except for Cutpoint Validation, are also used to create the folds for K-Fold Validation. See Make K-Fold Validation Column.
Random Validation Column
The default method if there are no column assignments in the launch window. This method partitions the data into sets or folds based on the allocations entered in the Make Validation Column report.
Stratified Validation Column
The selected method if one or more stratification columns are assigned. This method partitions the data into balanced sets based on the levels of the specified stratification columns. As in the Random Validation Column method, rows are randomly assigned to the holdback sets or folds based on the allocations entered in the Make Validation Column report. However, this is done at each level or combination of levels of the stratifying columns. Use this method when you want a balanced representation of the levels of a column in each of the training, validation, and test sets or in each of the folds in K-Fold crossvalidation.
Grouped Validation Column
The selected method if one or more grouping columns are specified. This method partitions the data into sets in such a way that entire levels of a specified column or combinations of levels of two or more columns are placed in the same set or fold. Because of this, the sizes of the resulting sets vary slightly from the sizes that you specified. Use this option when splitting levels across holdback sets or folds is not desirable.
Stratify by Group Validation Column
The selected method if both stratification and grouping columns are specified. This method partitions the data to balance the levels across the stratification column while requiring that the specified groups stay together in the same holdback sets or folds. As in Grouped Validation Column, groups can be created as levels of a specified column or combination of levels of two or more columns. The sizes of the resulting sets vary slightly from the sizes you specified.
Cutpoint Validation Column
The selected method if a cutpoint column is specified. This method partitions the data into sets based on the time series cutpoints. Use this option when you want to assign your data to holdback sets based on time periods. The training set consists of rows between the first cutpoint and the second cutpoint. The validation set consists of rows between the second and third cutpoints. The test set consists of the remaining rows. These sets are chosen based on options in the Set Cutpoints report.
Make Autovalidation Table
Creates a new data table that contains a duplication of the rows in the original data table concatenated to the rows in the original data table. The new data table, which can be used for crossvalidation, has four additional columns:
Valid Set
Assigns a value of 0 to the original data and a value of 1 to the duplicated data. The values in this column designate the training and validation sets. Use this column in the Validation role in the launch window of an analysis.
Valid ID
Assigns the row number of the original observation. This allows matching of training and validation set rows for each original observation.
Valid Weight
Assigns the autovalidation weight, to be used in the Freq role in the launch window of an analysis. For each value of Valid ID, the same uniform random number is generated for the training observation and the validation observation. For the training set, Valid Weight is calculated as follows:
Valid Weight = -log(1 - Valid Uniform)
For the validation set, Valid Weight is calculated as follows:
Valid Weight = -log(Valid Uniform)
The Valid Weight column is constructed so that the training data weights are negatively correlated with the validation data weights. This ensures that the difference in the fit of the validation data yields an effective crossvalidation of the fitting method.
Null Factor
Assigns the same normal random number for each value of Valid ID.
Tip: Use Make Autovalidation Table for small data tables, where using a subset as the training data could cause estimation problems.
Make K-Fold Validation Column
Creates a validation column with four or more categories, based on the specified stratification and grouping columns. Each category represents a fold to use in K-Fold crossvalidation. The Y column is used to order the rows and each row is then sequentially assigned to a fold. The validation column method determined by the specified stratification and grouping columns is described below the box. These are the same methods described in Make Validation Column. After a method is selected and you click OK, you specify the number of folds, K, in the Make Validation Column report. See Set Number of Folds.
A missing value in a stratification, grouping, or cutpoint column results in a missing value in the validation column for that row.