The Lipid Data.jmp data table contains blood measurements, physical measurements, and questionnaire data from 95 subjects at a California hospital. You are interested in creating a validation column to use during future analyses.
1. Select Help > Sample Data Library and open Lipid Data.jmp.
2. Select Analyze > Distribution.
3. Select Gender and click Y, Columns.
4. Click OK.
Figure 11.1 Distribution of Gender in Lipid Data.jmp
Figure 11.1 illustrates the distribution of Gender in the data set. Notice that there is not an equal proportion of males and females represented. Because there are fewer females within the data, you want to be sure to balance the genders across the validation and training sets.
5. Select Analyze > Predictive Modeling > Make Validation Column.
6. Select Gender and click Stratification Columns.
7. Click OK.
The Make Validation Column report appears with a description of the validation method you selected. There are also options to change the rates, column types, or set a seed.
8. (Optional) Type 1234 next to Random Seed in the Options section of the report.
9. Click Go.
A Validation column is added to the data table. You can explore the distribution of the validation and training sets by creating a Mosaic Plot.
10. Select Analyze > Fit Y by X.
11. Assign Validation to Y, Response, and Gender to X, Factor.
12. Click OK.
Figure 11.2 Distribution of Gender across Validation and Training Sets
Figure 11.2 illustrates the distribution of Gender across each of the validation and training sets. Note that about 75% of both females and males are in the training set and about 25% of both females and males are in the validation set.