Publication date: 07/24/2024

Overview of Self-Validated Ensemble Models

In the Generalized Regression platform, you can use the self-validated ensemble modeling (SVEM) method applied to forward selection or Lasso models. The SVEM method is a resampling method where each row is given nonzero weights in both the training and validation sets. The training and validation weights for a particular row are anti-correlated so that a row that has strong influence on the training set has weak influence on the validation set and vice versa. This approach can be useful for analyzing the results of a designed experiment.

The method can be summarized by the following steps:

1. Append a copy of the design matrix to itself vertically. The original design matrix has n rows, so the new design matrix has 2n rows and the same number of columns as the original design matrix.

2. Append a copy of the response vector to itself vertically. The new response vector has twice the number of rows as the original response vector.

3. Create n random values from an exponential distribution with location parameter equal to 1, where n is the number of rows in the original design matrix. These are the weights assigned to the first n rows of the design matrix.

4. Create n random values that are anti-correlated with the first n random values. These are the weights assigned to the last n rows of the design matrix.

5. Fit either a forward selection or Lasso model to the generated training and validation sets to obtain a set of parameter estimates.

6. Repeat step 3 through step 5 for each individual model. The number of individual models in the ensemble model is specified using the Samples option in the Model Launch control panel.

7. Average the parameter estimates from the individual models to form the ensemble model parameter estimates.

8. Obtain debiasing parameters by performing a simple linear regression of the original response versus the linear predictor that uses the ensemble model parameter estimates. The intercept and slope of this regression are applied to ensemble model prediction formula to produce the final SVEM prediction. This step is skipped if the value of the Samples option is less than 10.

Note: When a Validation column is specified in the Fit Model launch window, the SVEM method is implemented on the Training set. The Validation and Test sets are held back as a test set.

For more information about self-validated ensemble modeling (SVEM) method, see Lemkus et al. (2021).

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).