Example of Bagging to Improve Prediction

Bagging is used in a number of situations, one of which is improving predictive power. Bagging is especially helpful for unstable models. This example uses the Tiretread.jmp sample data table. There are three factors (SILICA, SILANE, and SULFUR) and four responses (ABRASION, MODULUS, ELONG, and HARDNESS). First, you fit a neural network model to simultaneously predict the four response variables as a function of the three factors. Then, you perform bagging on the neural network model. Last, you compare the predictions to show the improvements obtained through bagging.

Fit Neural Network Model

Select Help > Sample Data Library and open Tiretread.jmp.

Select Analyze > Predictive Modeling > Neural.

Select ABRASION, MODULUS, ELONG, and HARDNESS and click Y, Response.

Select SILICA, SILANE, and SULFUR and click X, Factor.

Click OK.

(Optional) Enter 2121 next to Random Seed.

Note: Results vary due to the random nature of choosing a validation set in the Neural Network model. Entering the seed above enables you to reproduce the results shown in this example.

Click Go.

Select Save Formulas from the red triangle menu next to Model NTanH(3).

Note: This option saves the predicted values for all response variables from the neural network model to the data table. Later, these values are compared to the predictions that are obtained from bagging.

Perform Bagging

Now that the initial model has been constructed, you can perform bagging using that model. Access the Bagging feature through the Profiler.

From the red triangle menu next to Model NTanH(3), select Profiler.

The Prediction Profiler appears at the bottom of the report.

From the red triangle menu next to Prediction Profiler, select Save Bagged Predictions.

Enter 100 next to Number of Bootstrap Samples.

(Optional) Enter 2121 next to Random Seed.

Note: Results vary due to the random nature of sampling with replacement. To reproduce the exact results in this example, set the Random Seed.

Click OK.

Return to the data table. For each response variable, there are three new columns denoted as Pred Formula <colname> Bagged Mean, StdError <colname> Bagged Mean, <colname> Bagged Std Dev. The Pred Formula <colname> Bagged Mean columns are the final predictions.

Columns Added to Data Table After Bagging

Compare the Predictions

To see how bagging improves predictive power, compare the predictions from the bagged model to the original model predictions. Use the Model Comparison platform to look at one response variable at a time.

Select Analyze > Predictive Modeling > Model Comparison.

Select Predicted ABRASION and click Y, Predictors.

Select Pred Formula ABRASION Bagged Mean and click Y, Predictors.

Click OK.

A window that contains a list of columns appears.

Select ABRASION and click OK.

From the red triangle menu next to Model Comparison, select Plot Actual by Predicted.

Comparison of Predictions for ABRASION

The Measures of Fit report and the Actual by Predicted Plot are shown in Comparison of Predictions for ABRASION. The predictions that were obtained from bagging are shown in blue. The predictions that were obtained from the original neural network model are shown in red. In general, the bagging predictions are closer to the line than the original model predictions. Because the bagging predictions are closer to the line, the RSquare value of 0.6699 for the bagged predictions is higher than the RSquare value for the original model predictions. You conclude that bagging has improved predictions for ABRASION.

This example compared the predictions for ABRASION. To compare predictions for another response variable, follow step 2 through step 6, replacing ABRASION with the desired response variable. As another example, Comparison of Predictions for HARDNESS shows the Measures of Fit report for HARDNESS. The report shows similar findings as the Measures of Fit report for ABRASION. The RSquare value for the bagged predictions is slightly higher than the RSquare value for the original model predictions, which indicates a better fit and improved predictions.

Comparison of Predictions for HARDNESS