Validation is the process of using part of a data set to estimate model parameters, and using the other part to assess the predictive ability of the model.
• The training set is used to estimate model parameters.
• The validation set is used in the model fitting to assess or validate the predictive ability of the model.
• The test set is a final, independent assessment of the model’s predictive ability. The test set is available only when using a validation column.
The training, validation, and test sets are created as subsets of the original data. This is done through the use of a validation column in the Fit Model launch window.
The validation column’s values determine how the data is split, and what method is used for validation:
• If the column has two distinct values, then training and validation sets are created.
• If the column has three distinct values, then training, validation, and test sets are created.
• If the column has more than three distinct values, or only one, then no validation is performed.
When a validation column is used, model fit statistics are given for the training, validation, and test sets in the Fit Details report. There is also a separate ROC curve, lift curve, and confusion matrix for each of the Training, Validation, and Test sets.
For more information about how a Validation column is used in JMP modeling platforms, see “Validation in JMP Modeling” in Predictive and Specialized Modeling.