Platforms That Support Validation

This appendix lists the types of cross validation available in each platform. The types of cross validation are defined as follows:

Use Excluded Rows as Validation Holdback

Uses the excluded rows in the data table as a validation holdback set.

Note: For platforms that support using excluded rows as a validation holdback set, the excluded rows are used only when there is no validation column or validation proportion specified in the launch window.

Random Validation Holdback

Randomly divides the original data into the training and validation sets. A test set can also be included. You can specify the proportions of the original data to use in each set.

K-Fold Cross-Validation

Divides the original data into K subsets. In turn, each of the K sets is used to validate the model fit on the rest of the data, fitting a total of K models. The model giving the best validation statistic is chosen as the final model.

Note: For some platforms, specify in model control launch. For others in launch. For still others, through validation column

Validation Role Column

Uses the column’s values to divide the data into parts. The column is assigned using the Validation role on the platform’s launch window.

Note: Different platforms treat a column with more than 3 levels differently. See notes in the following table.

Platform	Use Excluded Rows as Validation Holdback	Random Validation Holdback	K-Fold Cross-Validation	Validation Role Column
Fit Model > Fit Least Squares	No	No	No	Yes (for model evaluation only)¹
Fit Model > Forward Stepwise Regression	No	No	Yes (for continuous response models only)	Yes
Fit Model > Logistic Regression	No	No	No	Yes (for model evaluation only)a
Fit Model > Generalized Regression	No	Yes	Yes	Yes
Fit Model > Partial Least Squares	No	Yes	Yes	Yes
Partition	Yes	Yes	Yes	Yes²
Bootstrap Forest	Yes	Yes	No	Yesb
Boosted Tree	Yes	Yes	No	Yesb
K Nearest Neighbors	Yes	Yes	No	Yesb
Naive Bayes	Yes	Yes	No	Yesb
Neural	Yes	Yes	Yes (through model launch or validation column with more than 3 levels)	Yes
K Nearest Neighbors	Yes	Yes	No	Yesb
Naive Bayes	Yes	Yes	No	Yesb
Support Vector Machines	No	Yes	Yes (through model launch)	Yes
Functional Data Explorer	No	No	No	Yes (must be structured as a “Grouped Random” validation column)³
Discriminant	Optional	No	No	Yesb
Partial Least Squares	No	Yes	Yes (through model launch or validation column with more than 3 levels)	Yes
Uplift	No	Yes	No	Yesb

¹ If there are more than three levels, the validation column is ignored.

² If there are more than three levels, the platform uses only rows with the three smallest values.

³ If there are more than two levels, the smallest value defines the training set and all other values define the validation set.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).