Launch the Bootstrap Forest platform by selecting Analyze > Predictive Modeling > Bootstrap Forest.
Figure 5.7 Bootstrap Forest Launch Window
For more information about the options in the Select Columns red triangle menu, see Column Filter Menu in Using JMP.
The Bootstrap Forest platform launch provides the following options:
Y, Response
The response variable or variables that you want to analyze.
X, Factor
The predictor variables.
Weight
A column whose numeric values assign a weight to each row in the analysis.
Freq
A column whose numeric values assign a frequency to each row in the analysis.
Validation
A numeric column that defines the validation sets. This column should contain at most three distinct values:.
– If the validation column has two levels, the smaller value defines the training set and the larger value defines the validation set.
– If the validation column has three levels, the values, in order of increasing size, define the training, validation, and test sets.
– If the validation column has more than three levels, the rows that contain the smallest three values define the validation sets. All other rows are excluded from the analysis.
The Bootstrap Forest platform uses the validation column to train and tune the model or to train, tune, and evaluate the model. For more information about validation, see Validation in JMP Modeling.
If you click the Validation button with no columns selected in the Select Columns list, you can add a validation column to your data table. For more information about the Make Validation Column utility, see Make Validation Column.
By
A column or columns whose levels define separate analyses. For each level of the specified column, the corresponding rows are analyzed using the other variables that you have specified. The results are presented in separate reports. If more than one By variable is assigned, a separate report is produced for each possible combination of the levels of the By variables.
Method
Enables you to select the partition method (Decision Tree, Bootstrap Forest, Boosted Tree, K Nearest Neighbors, or Naive Bayes). These alternative methods, except for Decision Tree, are available in JMP Pro.
For more information about these methods, see Partition Models, Boosted Tree, K Nearest Neighbors, and Naive Bayes.
Validation Portion
The portion of the data to be used as the validation set.
Informative Missing
If selected, enables missing value categorization for categorical predictors and informative treatment of missing values for continuous predictors. See Informative Missing.
Ordinal Restricts Order
If selected, restricts consideration of splits to those that preserve the ordering.
After you select OK in the launch window, the Bootstrap Forest Specification window appears.
Figure 5.8 Bootstrap Forest Specification Window
Number of Rows
The number of rows in the data table.
Number of Terms
The number of columns that are specified as predictors.
Number of Trees in the Forest
The number of trees to grow and then average.
Number of Terms Sampled per Split
The number of predictors to consider as splitting candidates at each split. For each split, a new random sample of predictors is taken as the candidate set.
Bootstrap Sample Rate
The proportion of observations to sample (with replacement) for growing each tree. A new random sample is generated for each tree.
Minimum Splits Per Tree
The minimum number of splits for each tree.
Maximum Splits Per Tree
The maximum number of splits for each tree.
Minimum Size Split
The minimum number of observations needed on a candidate split.
Early Stopping
(Available only if validation is used.) If selected, the process stops growing additional trees if the additional trees do not improve the validation statistic. The validation statistic is the validation set’s Entropy RSquare value for a categorical response and its RSquare value for a continuous response. If not selected, the process continues until the specified number of trees is reached.
Multiple Fits over Number of Terms
If selected, creates a bootstrap forest for several values of number of terms sampled per split. The model for which results are displayed is the model whose Validation Set’s Entropy RSquare value (for a categorical response) or RSquare (for a continuous response) is the largest.
The lower bound is the Number of Terms Sampled per Split specification. The upper bound is specified by the following option:
Max Number of Terms
The maximum number of terms to consider for a split.
Use Tuning Table Design
Opens a window where you can select a data table containing values for the Forest panel tuning parameters, called a tuning design table. A tuning design table has a column for each option that you want to specify and has one or multiple rows that each represent a single Bootstrap Forest model design. If an option is not specified in the tuning design table, the default value is used.
For each row in the table, JMP creates a Bootstrap Forest model using the tuning parameters specified. If more than one model is specified in the tuning design table, the Model Validation-Set Summaries report lists the RSquare value for each model. The Bootstrap Forest report shows the fit statistics for the model with the largest RSquare value.
You can create a tuning design table using the Design of Experiments facilities. A bootstrap forest tuning design table can contain the following case-insensitive columns in any order:
– Number Trees
– Number Terms
– Portion Bootstrap
– Minimum Splits per Tree
– Maximum Splits per Tree
– Minimum Size Split
Suppress Multithreading
If selected, all calculations are performed on a single thread.
Random Seed
Specify a nonzero numeric random seed to reproduce the results for future launches of the platform. By default, the Random Seed is set to zero, which does not produce reproducible results. When you save the analysis to a script, the random seed that you enter is saved to the script.