This example uses weekly weather data collected over one year from 16 weather stations across the United States. Run the Weather Station Locations script in the data table to view a map of the locations. Not every weather station has a weekly temperature measurement for every week of the year. You are interested in creating a validation column for this data table based on the dates of data collection. For each weather station, you want to use the first 60% of observations for training, the next 25% of observations for validation, and the final 15% of observations for testing. This example shows the importance of using a Batch ID column in this type of scenario.
1. Select Help > Sample Data Library and open Functional Data/Weekly Weather Data.jmp.
2. Select Analyze > Predictive Modeling > Make Validation Column.
3. Select Week of Year and click Cutpoint Column.
4. Click OK.
5. In the list next to Determine cutpoints using, select Proportions.
6. In the boxes next to Training Set, Validation Set, and Test Set, enter 0.60, 0.25, and 0.15, respectively.
7. In the box next to New Column Name, type Cutpoint Validation.
8. Click Go.
A validation column called Cutpoint Validation is added to the data table.
9. Select Analyze > Tabulate.
10. Click ID and drag it to the Drop zone for rows.
11. Click Cutpoint Validation and drag it on top of N.
12. Click Row% and drag it on top of the cells.
Figure 11.4 Cutpoint Validation Column Proportions
Figure 11.4 shows that not all of the weather stations have the correct proportions for the training, validation, and test sets. Use a Batch ID column to obtain the correct percentages.
1. Select Analyze > Predictive Modeling > Make Validation Column.
2. Select Week of Year and click Cutpoint Column.
3. Select ID and click Cutpoint Batch ID.
4. Click OK.
5. In the list next to Determine cutpoints using, select Proportions.
6. In the boxes next to Training Set, Validation Set, and Test Set, enter 0.60, 0.25, and 0.15, respectively.
7. In the box next to New Column Name, type Cutpoint Batch Validation.
8. Click Go.
A validation column called Cutpoint Batch Validation is added to the data table.
9. Select Analyze > Tabulate.
10. Click ID and drag it to the Drop zone for rows.
11. Click Cutpoint Batch Validation and drag it on top of N.
12. Click Row% and drag it on top of the cells.
Figure 11.5 Cutpoint Validation Column with Batch ID Proportions
Figure 11.5 shows that using a Cutpoint Batch ID column ensures that each weather station has proportions for the training, validation, and test sets that are much closer to the specified values.