In this example, you compare two cutpoint validation columns. The first is created using only a cutpoint column and the second is created using both a cutpoint column and a batch ID column. The data are weekly weather data collected over one year from 16 weather stations across the United States. Not every weather station has a weekly temperature measurement for every week of the year. For each weather station, you want to use the first 60% of observations for training, the next 25% of observations for validation, and the final 15% of observations for testing.
1. Select Help > Sample Data Folder and open Functional Data/Weekly Weather Data.jmp.
2. Select Analyze > Predictive Modeling > Make Validation Column.
3. Select Week of Year and click Cutpoint Column.
4. Click OK.
5. In the list next to Determine cutpoints using, select Proportions.
6. In the boxes next to Training Set, Validation Set, and Test Set, enter 0.60, 0.25, and 0.15, respectively.
7. In the box next to New Column Name, type Cutpoint Validation.
8. Click Go.
A validation column called Cutpoint Validation is added to the data table.
9. Select Analyze > Tabulate.
10. Click ID and drag it to the Drop zone for rows.
11. Click Cutpoint Validation and drag it on top of N.
12. Click Row% and drag it on top of the cells.
Figure 12.4 Cutpoint Validation Column Proportions
Figure 12.4 shows that not all of the weather stations have the correct proportions for the training, validation, and test sets. Use a Batch ID column to obtain the correct percentages.
1. Select Analyze > Predictive Modeling > Make Validation Column.
2. Select Week of Year and click Cutpoint Column.
3. Select ID and click Cutpoint Batch ID.
4. Click OK.
5. In the list next to Determine cutpoints using, select Proportions.
6. In the boxes next to Training Set, Validation Set, and Test Set, enter 0.60, 0.25, and 0.15, respectively.
7. In the box next to New Column Name, type Cutpoint Batch Validation.
8. Click Go.
A validation column called Cutpoint Batch Validation is added to the data table.
9. Select Analyze > Tabulate.
10. Click ID and drag it to the Drop zone for rows.
11. Click Cutpoint Batch Validation and drag it on top of N.
12. Click Row% and drag it on top of the cells.
Figure 12.5 Cutpoint Validation Column with Batch ID Proportions
Figure 12.5 shows that using a Cutpoint Batch ID column ensures that each weather station has proportions for the training, validation, and test sets that are much closer to the specified values.