Example of Bagging to Indicate the Accuracy of Predictions

Bagging is also used to indicate the accuracy of the prediction through standard errors and other distributional measures. In platforms where the Save Predicted Formulas option is available in Bagging, you can make predictions on new observations and determine how accurate they are. The Save Predicted Formulas option is available in the Standard Least Squares, Generalized Regression, and Generalized Linear Models platforms.

In the Tiretread.jmp data table, suppose that you are interested in only predicting ABRASION as a function of the three factor variables. In this example, you fit a generalized regression model to predict ABRASION. Then, you perform bagging on that model. Last, you make a prediction for a new observation and investigate the accuracy of that prediction. This is done by obtaining a confidence interval for the prediction.

Fit a Generalized Regression Model

Select Help > Sample Data Library and open Tiretread.jmp.

Select Analyze > Fit Model.

Select ABRASION and click Y.

Select Generalized Regression from the Personality list.

Select SILICA, SILANE, and SULFUR and click Add.

Click Run.

Click Go.

Perform Bagging

Select Profilers > Profiler from the red triangle menu next to Adaptive Lasso with AICc Validation.

The Prediction Profiler appears at the bottom of the report.

From the red triangle menu next to Prediction Profiler, select Save Bagged Predictions.

Enter 500 next to Number of Bootstrap Samples.

(Optional) Enter 4321 next to Random Seed.

Note: Results vary due to the random nature of sampling with replacement. To reproduce the exact results in this example, set the Random Seed.

Confirm that Save Prediction Formulas is selected.

Click OK.

Note: This might take longer to run than the Example of Bagging to Improve Prediction. The larger number of samples gives a better estimate of the prediction distributions.

Return to the data table. For each response variable, there are three new columns denoted as Pred Formula <colname> Bagged Mean, StdError <colname> Bagged Mean, <colname> Bagged Std Dev. The Pred Formula ABRASION Bagged Mean column is the final prediction.

Prediction for a New Observation

You now have predictions for ABRASION for each observation in the data table, as well as the standard errors for those predictions. Suppose that you have an observation with new values of 0.9, 43, and 2 for SILICA, SILANE, and SULFUR, respectively. You can predict the ABRASION response and obtain a confidence interval for that prediction because the Save Prediction Formulas option saves the regression equation for each bagged model. Therefore, M predictions are made with the new factor values to create a distribution of possible predictions. The mean is the final prediction, but analyzing the distribution tells you how accurate the prediction is.

In the data table, select Rows > Add Rows.

Enter 1 in the How many rows to add box and click OK.

Under the SILICA column, type 0.9 in the box for the new row.

Under the SILANE column, type 43 in the box for the new row.

Under the SULFUR column, type 2 in the box for the new row.

The prediction columns for the new row are automatically calculated.

Figure 2.34 Values for New Row

Select Tables > Transpose.

Select ABRASION Bags (500/0) and click Transpose Columns.

Click OK.

Select Analyze > Distribution.

10.

Select Row 21 and click Y, Columns.

Note: Row 21 corresponds to the predictions from the new observation.

11.

Click OK.

12.

From the red triangle menu next to Row 21, select Display Options > Horizontal Layout.

Figure 2.35 Distribution Report

The Distribution Report in Figure 2.35 contains information about the distribution of the predicted values of ABRASION from each bagged model. The final prediction of ABRASION for the new observation is 104.45, which is the mean of all the M bagged predictions. This prediction has a standard error of 4.56. You can also create confidence intervals for the new prediction using the quantiles. For example, a 95% confidence interval for the new prediction is 95.89 to 113.00.