Model Validation Set Summaries

For the latest version of JMP Help, visit JMP.com/help.

Predictive and Specialized Modeling > Bootstrap Forest > The Bootstrap Forest Report > Model Validation Set Summaries

Publication date: 02/06/2025

Model Validation Set Summaries

In the Bootstrap Forest platform, the Model Validation Set Summaries report shows fit statistics for all of the fitted models. This report is available only when you select the Multiple Fits over Number of Terms option in the Bootstrap Forest Specification window. See Figure 5.10 and Multiple Fits Panel.

Specifications

In the Bootstrap Forest platform, the Specifications report shows the settings used in fitting the model.

Overall Statistics

In the Bootstrap Forest platform, the Overall Statistics report provides fit statistics for the training set and for the validation and test sets if they are specified. The specific form of the report depends on the modeling type of the response.

Suppose that multiple models are fit using the Multiple Fits over Multiple Terms option in the Bootstrap Forest Specification window. Then the model for which results are displayed in the Overall Statistics and Cumulative Validation reports is the model for which the validation set’s Entropy RSquare value (for a categorical response) or RSquare (for a continuous response) is the largest.

Categorical Response

Measures Report

Gives the following statistics for the training set, and for the validation and test sets if they are specified.

Note: For Entropy RSquare and Generalized RSquare, values closer to 1 indicate a better fit. For Mean -Log p, RASE, Mean Abs Dev, and Misclassification Rate, smaller values indicate a better fit.

Entropy RSquare

A measure of fit that compares the log-likelihoods from the fitted model and the constant probability model. It ranges from 0 to 1. See “Entropy RSquare”.

Generalized RSquare

A measure that can be applied to general regression models. It is based on the likelihood function L and is scaled to have a maximum value of 1. The value is 1 for a perfect model, and 0 for a model no better than a constant model. The Generalized RSquare measure simplifies to the traditional RSquare for continuous normal responses in the standard least squares setting. Generalized RSquare is also known as the Nagelkerke or Craig and Uhler R2, which is a normalized version of Cox and Snell’s pseudo R2.

Mean -Log P

The average of negative log(p), where p is the fitted probability associated with the event that occurred.

RASE

The root average squared prediction error. The differences are between 1 and p, the fitted probability for the response level that actually occurred.

Mean Abs Dev

The average of the absolute values of the differences between the response and the predicted response. The differences are between 1 and p, the fitted probability for the response level that actually occurred.

Misclassification Rate

The rate for which the response category with the highest fitted probability is not the observed category.

The number of observations.

Number of Trees

Shows the actual number of trees used in the model.

Confusion Matrix Report

(Available only for categorical responses.) Shows classification statistics for the training set, and for the validation and test sets if they are specified. The Confusion Matrix Report contains confusion matrices and confusion rates matrices. A confusion matrix is a two-way classification of actual and predicted responses. A confusion rates matrix is equal to the confusion matrix, with the numbers divided by the row totals.

Decision Matrix

(Available only for categorical responses and if the response has a Profit Matrix column property or if you specify costs using the Specify Profit Matrix option.) Gives Decision Count and Decision Rate matrices for the training set, and for the validation and test sets if they are specified. See “Additional Examples of the Partition Platform”.

Continuous Response

RSquare and RASE Report

Gives Rsquare, root average squared prediction error, and the number of observations for the training set, and for the validation and test sets, if they are defined.

Number of Trees

Gives the actual number of trees used in the model.

Individual Trees Report

Gives RASE values, which are averaged over all trees, for In Bag and Out of Bag observations. Training set observations that are used to construct a tree are called in-bag observations. Training observations that are not used to construct a tree are called out-of-bag (OOB) observations.

For each tree, the Out of Bag RASE is computed as the square root of the sum of squared errors divided by the number of OOB observations. The squared Out of Bag RASE for each tree is given in the Per-Tree Summaries report as OOB SSE/N.

Want more information? Have questions? Get answers in the JMP User Community (community.jmp.com).