Publication date: 07/08/2024

K-Fold Cross Validation in Stepwise Regression

K-fold cross validation randomly divides the data into k subsets. In turn, each of the k sets is used as a validation set while the remaining data are used as a training set to fit the model. In total, k models are fit and k validation statistics are obtained. The model giving the best validation statistic is chosen as the final model. This method is useful for small data sets, because it makes efficient use of limited amounts of data.

Note: K-fold cross validation is available only for continuous responses.

In JMP, click the Stepwise Fit red triangle and select K-Fold Crossvalidation.

In JMP Pro, you can access k-fold cross validation in two ways:

Click the Stepwise Fit red triangle and select K-Fold Crossvalidation.

Specify a validation column with four or more distinct values.

RSquare K-Fold Statistic

If you conduct k-fold cross validation, the RSquare K-Fold statistic appears to the right of the other statistics in the Stepwise Regression Control panel. RSquare K-Fold is calculated as follows:

1 - Sum(SSE)/Sum(SST)

where:

SSE represents a vector of the error sum of squares in each of the k folds

SST represents a vector of the total sum of squares in each of the k folds

Max K-Fold RSquare

When you use k-fold cross validation, the Stopping Rule defaults to Max K-Fold RSquare. This rule attempts to maximize the RSquare K-Fold statistic.

Note: Max K-Fold RSquare considers only the models defined by p-value entry (Forward direction) or removal (Backward direction). It does not consider all possible models.

The Max K-Fold RSquare stopping rule behaves in a fashion similar to the Max Validation RSquare stopping rule. See Max Validation RSquare. Replace references to RSquare Validation with RSquare K-Fold.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).