Publication date: 07/08/2024

Image shown hereValidation in Logistic Regression Models

Validation is the process of using part of a data set to estimate model parameters, and using the other part to assess the predictive ability of the model.

The training set is used to estimate model parameters.

The validation set is used in the model fitting to assess or validate the predictive ability of the model.

The test set is a final, independent assessment of the model’s predictive ability. The test set is available only when using a validation column.

The training, validation, and test sets are created as subsets of the original data. This is done through the use of a validation column in the Fit Model launch window.

The validation column’s values determine how the data is split, and what method is used for validation:

If the column has two distinct values, then training and validation sets are created.

If the column has three distinct values, then training, validation, and test sets are created.

If the column has more than three distinct values, or only one, then no validation is performed.

When a validation column is used, model fit statistics are given for the training, validation, and test sets in the Fit Details report. There is also a separate ROC curve, lift curve, and confusion matrix for each of the Training, Validation, and Test sets.

For more information about how a Validation column is used in JMP modeling platforms, see Validation in JMP Modeling in Predictive and Specialized Modeling.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).