•
|
The training set is the part that is used to estimate model parameters.
|
•
|
The validation set is the part that assesses or validates the predictive ability of the model.
|
•
|
The test set is a final, independent assessment of the model’s predictive ability. The test set is available only when using a validation column. See Launch the Partition Platform.
|
When a validation method is used, the Go button appears. The Go button provides for repeated splitting without having to repeatedly click the Split button. When you click the Go button, splitting occurs until the validation R-Square is better than what the next 10 splits would obtain. This rule can result in complex trees that are not very interpretable, but have good predictive power.
Using the Go button turns on the Split History command. If using the Go button results in a tree with more than 40 nodes, the Show Tree command is turned off.
For more information about using row states and how to exclude rows, see Hide and Exclude Rows in the Using JMP book.
Randomly divides the original data into the training and validation data sets. The Validation Portion on the platform launch window is used to specify the proportion of the original data to use as the validation data set (holdback). See Launch the Partition Platform for details about the Validation Portion.
Randomly divides the original data into K subsets. In turn, each of the K sets is used to validate the model fit on the rest of the data, fitting a total of K models. The final model is selected based on the cross validation RSquare, where a stopping rule is imposed to avoid overfitting the model. This method is useful for small data sets, because it makes efficient use of limited amounts of data. See K-Fold Crossvalidation.