This appendix lists the types of cross validation available in each platform. The types of cross validation are defined as follows:
Use Excluded Rows as Validation Holdback
Uses the excluded rows in the data table as a validation holdback set.
Note: For platforms that support using excluded rows as a validation holdback set, the excluded rows are used only when there is no validation column or validation proportion specified in the launch window.
Random Validation Holdback
Randomly divides the original data into the training and validation sets. A test set can also be included. You can specify the proportions of the original data to use in each set.
K-Fold Cross-Validation
Divides the original data into K subsets. In turn, each of the K sets is used to validate the model fit on the rest of the data, fitting a total of K models. The model giving the best validation statistic is chosen as the final model.
Note: For some platforms, specify in model control launch. For others in launch. For still others, through validation column
Validation Role Column
Uses the column’s values to divide the data into parts. The column is assigned using the Validation role on the platform’s launch window.
Note: Different platforms treat a column with more than 3 levels differently. See notes in the following table.
Platform |
Use Excluded Rows as Validation Holdback |
Random Validation Holdback |
K-Fold Cross-Validation |
Validation Role Column |
---|---|---|---|---|
Fit Model > Fit Least Squares |
No |
No |
No |
Yes (for model evaluation only)1 |
Fit Model > Forward Stepwise Regression |
No |
No |
Yes (for continuous response models only) |
Yes |
Fit Model > Logistic Regression |
No |
No |
No |
Yes (for model evaluation only)a |
Fit Model > Generalized Regression |
No |
Yes |
Yes |
Yes |
Fit Model > Partial Least Squares |
No |
Yes |
Yes |
Yes |
Partition |
Yes |
Yes |
Yes |
Yes2 |
Bootstrap Forest |
Yes |
Yes |
No |
Yesb |
Boosted Tree |
Yes |
Yes |
No |
Yesb |
K Nearest Neighbors |
Yes |
Yes |
No |
Yesb |
Naive Bayes |
Yes |
Yes |
No |
Yesb |
Neural |
Yes |
Yes |
Yes (through model launch or validation column with more than 3 levels) |
Yes |
K Nearest Neighbors |
Yes |
Yes |
No |
Yesb |
Naive Bayes |
Yes |
Yes |
No |
Yesb |
Support Vector Machines |
No |
Yes |
Yes (through model launch) |
Yes |
Functional Data Explorer |
No |
No |
No |
Yes (must be structured as a “Grouped Random” validation column)3 |
Discriminant |
Optional |
No |
No |
Yesb |
Partial Least Squares |
No |
Yes |
Yes (through model launch or validation column with more than 3 levels) |
Yes |
Uplift |
No |
Yes |
No |
Yesb |
1 If there are more than three levels, the validation column is ignored.
2 If there are more than three levels, the platform uses only rows with the three smallest values.
3 If there are more than two levels, the smallest value defines the training set and all other values define the validation set.