Use the Make Validation Column platform to create a stratified validation column to balance the levels of a particular variable across the training and validation sets.
1. Select Help > Sample Data Folder and open Lipid Data.jmp.
2. Select Analyze > Distribution.
3. Select Gender and click Y, Columns.
4. Click OK.
Figure 12.1 Distribution of Gender in Lipid Data.jmp
Figure 12.1 illustrates the distribution of Gender in the data set. Notice that there is not an equal proportion of males and females represented. Because there are fewer females within the data, you want to be sure to balance the genders across the validation and training sets.
5. Select Analyze > Predictive Modeling > Make Validation Column.
6. Select Gender and click Stratification Columns.
7. Click OK.
The Make Validation Column report appears with a description of the validation method that you selected. There are also options to change the rates, column types, or set a seed.
8. (Optional) Type 1234 next to Random Seed in the Options section of the report.
9. Click Go.
A Validation column is added to the data table. You can explore the distribution of the validation and training sets by creating a Mosaic Plot.
10. Select Analyze > Fit Y by X.
11. Assign Validation to Y, Response, and Gender to X, Factor.
12. Click OK.
Figure 12.2 Distribution of Gender across Validation and Training Sets
Figure 12.2 illustrates the distribution of Gender across each of the validation and training sets. Note that about 75% of both females and males are in the training set and about 25% of both females and males are in the validation set.