In the Boosted Tree platform, the Overall Statistics report shows fit statistics for the training set, and for the validation and test sets if they are specified.
Suppose that you fit multiple models using the Multiple Fits over Splits and Learning Rate option in the Boosted Tree Specification window. Then the model for which results are displayed in the Overall Statistics and Cumulative Validation reports is the model for which the validation set’s Entropy R-square value (for a categorical response) or R-square (for a continuous response) is the largest.
(Available only for categorical responses.) Gives the following statistics for the training set, and for the validation and test sets if they are specified.
Note: For Entropy R-Square and Generalized R-Square, values closer to 1 indicate a better fit. For Mean -Log p, RASE, Mean Abs Dev, and Misclassification Rate, smaller values indicate a better fit.
Entropy RSquare
A measure of fit that compares the log-likelihoods from the fitted model and the constant probability model. It ranges from 0 to 1. See Entropy RSquare.
Generalized RSquare
A measure that can be applied to general regression models. It is based on the likelihood function L and is scaled to have a maximum value of 1. The value is 1 for a perfect model, and 0 for a model no better than a constant model. The Generalized R-Square measure simplifies to the traditional R-Square for continuous normal responses in the standard least squares setting. Generalized R-Square is also known as the Nagelkerke or Craig and Uhler R2, which is a normalized version of Cox and Snell’s pseudo R2.
Mean -Log P
The average of negative log(p), where p is the fitted probability associated with the event that occurred.
RASE
The root average squared prediction error. The differences are between 1 and p, the fitted probability for the response level that actually occurred.
Mean Abs Dev
The average of the absolute values of the differences between the response and the predicted response. The differences are between 1 and p, the fitted probability for the response level that actually occurred.
Misclassification Rate
The rate for which the response category with the highest fitted probability is not the observed category.
N
The number of observations.
(Available only for categorical responses.) Shows classification statistics for the training set, and for the validation and test sets if they are specified. The Confusion Matrix Report contains confusion matrices and confusion rates matrices. A confusion matrix is a two-way classification of actual and predicted responses. A confusion rates matrix is equal to the confusion matrix, with the numbers divided by the row totals.
(Available only for categorical responses and if the response has a Profit Matrix column property or if you specify costs using the Specify Profit Matrix option.) Gives Decision Count and Decision Rate matrices for the training set, and for the validation and test sets if they are specified. See Additional Examples of the Partition Platform.