In the Generalized Regression report, each model fit has a red triangle menu that contains these options:
Caution: Many options in the platform are not available if you specify a column that has the Expression data type or Vector modeling type in the launch window. For SVEM model fit options, see Model Fit Options for Self-Validated Ensemble Models.
Regression Reports
Enables you to customize the reports that are shown for the specified model fit. All of the following reports are shown by default except for the Parameter Estimates for Centered and Scaled Parameter Estimates report and the Active Parameter Estimates report.
Model Summary
Shows or hides the Model Summary report that includes information about the specification and goodness of fit statistics for the model. This option also displays the Estimation Details report for applicable models. See Model Summary and Estimation Details.
Solution Path
(Not available for Maximum Likelihood models.) Shows or hides the Solution Path and Validation Path plots. See Solution Path.
Parameter Estimates for Centered and Scaled Predictors
Shows or hides a table of centered and scaled parameter estimates. See Parameter Estimates for Centered and Scaled Predictors.
Parameter Estimates for Original Predictors
Shows or hides a table of parameter estimates in the original scale of the data. See Parameter Estimates for Original Predictors.
Active Parameter Estimates
(Not available for Maximum Likelihood or Ridge Regression models.) Shows or hides a table of active, or nonzero, parameter estimates for the currently selected model.
Show Solution Path Summary
(Not available for Maximum Likelihood or Ridge Regression models.) Shows or hides a report that contains a table of fit statistics for the points on the Solution Path and Validation Path plots where the active set changes. The statistics that are available depend on the estimation method. For more information about the conditional model probabilities that are available for Normal Lasso with BIC Validation models, see Hu et al. (2019). When BIC, AICc, or ERIC Validation is specified, the cells in the BIC and AICc columns are colored in the same manner as the Validation Plot. See Comparable Model Zones.
Effect Tests
Shows or hides tests for each effect. Each effect test is testing the null hypothesis that all parameters associated with that effect are zero. A nominal or ordinal effect can have several associated parameters, based on its number of levels. The effect test for such an effect tests whether all of the associated parameters are zero. When the Distribution is Multinomial, the effects are combined over the levels of the response. See Effect Tests.
Show Prediction Expression
Shows or hides the Prediction Expression report that contains the equation for the estimated model. See “Show Prediction Expression” for an example.
Select Nonzero Terms
(Not available when the specified Estimation Method is Ridge Regression.) Highlights terms with nonzero coefficients in the report. Also selects all associated columns in the data table.
Select Zeroed Terms
(Not available when the specified Estimation Method is Ridge Regression.) Highlights terms with zero coefficients in the report. Also selects all associated columns in the data table.
Relaunch Active Set
(Not available for models that contain a predictor that has the Vector modeling type.) Contains options that open a Fit Model launch window where the Construct Model Effects list contains a set of terms based on the terms that have nonzero parameter estimates. These terms are the active effects. All other specifications in the launch window are those used in the original analysis.
Note: If you select any of the Relaunch Active Set options in a report that contains a By variable, the By variable is not added to the Fit Model launch window.
Relaunch with Active Effects
Populates the Construct Model Effects list only with the active effects.
Relaunch Active Main Effects and Second Degree Factorial
Populates the Construct Model Effects list with a second degree factorial constructed with the active effects.
Relaunch Active Main Effects and Third Degree Factorial
Populates the Construct Model Effects list with a third degree factorial constructed with the active effects.
Relaunch Active Main Effects and Full Factorial
Populates the Construct Model Effects list with a full factorial constructed with the active effects.
Relaunch Active Main Effects and Second Degree Polynomial
Populates the Construct Model Effects list with a second degree polynomial constructed with the active effects.
Relaunch Active Main Effects and Third Degree Polynomial
Populates the Construct Model Effects list with a third degree polynomial constructed with the active effects.
Relaunch Active Main Effects and Response Surface Model
Populates the Construct Model Effects list with a response surface model constructed with the active effects.
Hide Inactive Paths
Adjusts the transparency of the inactive paths in the Solution Path Parameter Estimates plot so that the paths that are not currently active appear faded.
Odds Ratios
(Available only when the specified Distribution is Binomial and the model contains an intercept. Not available for models that contain a predictor that has the Vector modeling type.) Shows or hides a report that contains odds ratios for categorical predictors, and unit odds ratios and range odds ratios for continuous predictors. An odds ratio is the ratio of the odds for two events. The odds of an event is the probability that the event of interest occurs versus the probability that it does not occur. The event of interest is defined by the Target Level in the Fit Model launch window.
For each categorical predictor, an Odds Ratios report appears. Odds ratios are shown for all combinations of levels of a categorical model term.
If there are continuous predictors, two additional reports appear:
– Unit Odds Ratios Report. The unit odds ratio is calculated over a one-unit change in a continuous model term.
– Range Odds Ratios Report. The range odds ratio is calculated over the entire range of a continuous model term.
The confidence intervals in the Odds Ratios report are Wald-based intervals. Note that the odds ratio for a model term is meaningful only if the model term is not involved in any higher-order effects.
Note: If there are interactions in the model, you can use the Multiple Comparisons option to obtain odds ratios. See Multiple Comparisons.
Incidence Rate Ratios
(Available only when the specified Distribution is Poisson or Negative Binomial and the model contains an intercept.) Shows or hides a report that contains incidence rate ratios for categorical predictors, and unit incidence rate ratios and range incidence rate ratios for continuous predictors. An incidence rate ratio is the ratio of the incidence rate for two events. The incidence rate for a model term is the number of new events that occur over a given time period.
For each categorical predictor, an Incidence Rate Ratios report appears. Incidence rate ratios are shown for all combinations of levels of a categorical model term.
If there are continuous predictors, two additional reports appear:
– Unit Incidence Rate Ratios Report. The unit incidence rate ratio is calculated over a one-unit change in a continuous model term.
– Range Incidence Rate Ratios Report. The range incidence rate ratio is calculated over the entire range of a continuous model term.
The confidence intervals in the Incidence Rate Ratios report are Wald-based intervals. Note that the incidence rate ratio for a model term is meaningful only if the model term is not involved in any higher-order effects.
Hazard Ratios
(Available only when the specified Distribution is Cox Proportional Hazards.) Shows or hides a report that contains hazard ratios for categorical predictors, and unit hazard ratios and range hazard ratios for continuous predictors. A hazard ratio is the ratio of the hazard rate for two events. The hazard rate at time t for an event is the conditional probability that the event will not survive an additional amount of time, given that it has survived to time t.
For each categorical predictor, a Hazard Ratios report appears. Hazard ratios are shown for all combinations of levels of a categorical model term.
If there are continuous predictors, two additional reports appear:
– Unit Hazard Ratios Report. The unit hazard ratio is calculated over a one-unit change in a continuous model term.
– Range Hazard Ratios Report. The range hazard ratio is calculated over the entire range of a continuous model term.
The confidence intervals in the Hazard Ratios report are Wald-based intervals. Note that the hazard ratio for a model term is meaningful only if the model term is not involved in any higher-order effects.
Covariance of Estimates
Shows or hides the matrix of covariances of the parameter estimates. These are calculated using M-estimation and a sandwich formula (Zou 2006 and Huber and Ronchetti 2009). The covariance matrix does not contain zeroed terms.
Correlation of Estimates
Shows or hides the matrix of correlations of the parameter estimates. These are calculated using M-estimation and a sandwich formula (Zou 2006 and Huber and Ronchetti 2009). The correlation matrix does not contain zeroed terms.
Inverse Prediction
(Not available for models that contain a predictor that has the Vector modeling type.) Predicts an X value, given specific values for Y and the other X variables. This can be used to predict continuous variables only. For more information about Inverse Prediction, see “Inverse Prediction”.
Multiple Comparisons
(Not available for models that contain a predictor that has the Vector modeling type or for models that do not contain any categorical predictors.) Shows the Multiple Comparisons launch window. For more information about the Multiple Comparisons launch window and report, see “Multiple Comparisons”. Note that the multiple comparisons are performed on the linear predictor scale. When the specified Distribution is Binomial, the multiple comparisons are performed on the odds ratios. When the specified Distribution is Poisson, the multiple comparisons are performed on the incidence rate ratios. When the specified Distribution is Cox Proportional Hazards, the multiple comparisons are performed on the hazard ratios.
Confusion Matrix
(Available only when the specified Distribution is Binomial, Multinomial, or Ordinal Logistic.) Shows or hides a matrix that tabulates the actual response levels and the predicted response levels. For a good model, predicted response levels should be the same as the actual response levels. The confusion matrix enables you to assess how the predicted responses align with the actual responses. The misclassification rate summarizes the off-diagonal results. If you used validation, a confusion matrix is shown for each of the Training, Validation, and Test sets.
Set Probability Threshold
(Available only when the specified Distribution is Binomial.) Specify a cutoff probability for classifying the response. By default, an observation is classified into the Target Level when its predicted probability exceeds 0.5. Change the threshold to specify a value other than 0.5 as the cutoff for classification into the Target Level. The Predicted Rate in the confusion matrix and the misclassification rate are updated to reflect classification according to the specified threshold.
If the response has a Profit Matrix column property, the initial value for the probability threshold is determined by the profit matrix.
Profilers
(Not available for models that contain a predictor that has the Vector modeling type.) Provides various profilers that enable you to explore the fitted model.
Note: When the number of rows is less than or equal to 500 and the number of predictors is less than or equal to 30, the Profiler plots update continuously as you drag the current model indicator in either Solution Path plot. Otherwise, they update when you release the mouse button.
Profiler
Shows or hides the Prediction Profiler. Predictors that have parameter estimates of zero and that are not involved in any interaction terms with nonzero coefficients do not appear in the profiler. For more information about the prediction profiler, see “Profiler” in Profilers.
Distribution Profiler
(Not available when the specified Distribution is Binomial or Quantile Regression.) Shows or hides a profiler of the cumulative distribution function of the predictors and the response. The response is shown in the right-most cell.
Quantile Profiler
(Not available when the specified Distribution is Binomial or Quantile Regression.) Shows or hides a profiler that shows the predicted response as a function of the predictors and the quantile of the cumulative distribution function. The quantile is called Probability and is shown in the right-most cell.
Survival Profiler
(Available only when the specified Distribution is Normal, Exponential, Weibull, Lognormal, or Cox Proportional Hazards.) Shows or hides a profiler that shows the survival function as a function of the predictors and the response. The response is shown in the right-most cell.
Hazard Profiler
(Available only when the specified Distribution is Normal, Exponential, Weibull, Lognormal, or Cox Proportional Hazards.) Shows or hides a profiler that shows the hazard rate as a function of the predictors and the response. The response is shown in the right-most cell.
Custom Test
Shows or hides a Custom Test report that enables you to test a custom hypothesis. If the model has a Solution Path, the custom test results update as you update the solution. For more information about custom tests, see “Custom Test”. The Custom Test red triangle menu contains an option to remove the Custom Test report.
Diagnostic Plots
Provides various plots to help assess how well the current model fits. If a Validation column is specified or if KFold, Holdback, or Leave-One-Out is selected as the Validation Method, the options below enable you to view the training, validation, and, if applicable, test sets, or they construct separate plots for these sets. If KFold or Leave-One-Out is selected, then the plots correspond to the validation set that optimizes prediction error, and its corresponding training set. See KFold.
Note: All Diagnostic plots update continuously as you drag the current model indicator in either Solution Path plot.
Diagnostic Bundle
(Not available when the specified Distribution is Binomial, Multinomial, Ordinal Logistic, or Cox Proportional Hazards.) Shows or hides a set of four graphs, including a plot of residuals by predicted values, residuals by row number, a histogram of the residuals, and a histogram of the probability of observing a response larger than the observed response.
The graphs are constructed using all observations. If you used a Validation Column or if you selected KFold, Holdback, or Leave-One-Out as the Validation Method, check boxes enable you to select the Training, Validation, and, if applicable, Test sets. Rows corresponding to these sets are selected in the data table and the corresponding points and areas are highlighted in the graphs. Use this option to determine whether the model fit is similar across the sets.
The Fitted Probability of Observing a Larger Response histogram helps you assess goodness of fit of the model. Different criteria apply based on the distribution:
– For distributions other than zero-inflated distributions and quantile regression, the “correct” model should display an approximately uniform distribution of values.
– For a zero-inflated distribution, the histogram should display a point mass at zero and an approximately uniform distribution elsewhere.
– For quantile regression, the histogram should display an approximately uniform distribution of values to the left of the specified quantile and an approximately uniform distribution of slightly higher values to the right of the specified quantile.
Plot Baseline Survival and Hazard
(Available only when the specified Distribution is Cox Proportional Hazards.) Shows or hides the Baseline Survival and Hazard plots, which plot the survival and hazard functions for the baseline proportional hazards function versus the response variable. Below the plots, there is a table that contains the plotted values.
Note: If the specified Distribution is Cox Proportional Hazards, the Plot Baseline Survival and Hazard option is the only available Diagnostic Plot.
ROC Curve
(Available only when the specified Distribution is Binomial, Multinomial, or Ordinal Logistic.) Shows or hides the Receiver Operating Characteristic (ROC) curve. If you used validation, an ROC curve is shown for each of the Training, Validation, and Test sets.
The ROC curve measures the ability of the fitted probabilities to classify response levels correctly. The further the curve from the diagonal, the better the fit. An introduction to ROC curves is found in “ROC Curves” in Basic Analysis.
If the response has more than two levels, the ROC Curve plot displays an ROC curve for each response level. For a given response level, this curve is the ROC curve for correct classification into that level. See “ROC Curve” in Predictive and Specialized Modeling for more information about ROC curves.
Precision Recall Curve
(Available only when the specified Distribution is Binomial, Multinomial, or Ordinal Logistic.) Shows or hides the Precision-Recall Curve plot. A precision-recall curve plots the precision values against the recall values at a variety of thresholds. If you used validation, a plot is shown for each of the Training, Validation, and Test sets.
If the response has more than two levels, the plot contains a precision-recall curve for each level of the response. For a given response level, this curve is the precision-recall curve for correct classification into that level. See “Precision-Recall Curve” in Predictive and Specialized Modeling for more information about precision-recall curves.
Lift Curve
(Available only when the specified Distribution is Binomial, Multinomial, or Ordinal Logistic.) Shows or hides the lift curve for the model. If you used validation, a Lift curve is shown for each of the Training, Validation, and Test sets.
A lift curve shows how effectively response levels are classified as their fitted probabilities decrease. The fitted probabilities are plotted along the horizontal axis in descending order. The vertical coordinate for a fitted probability is the proportion of correct classifications for that probability or higher, divided by the overall correct classification rate. Use the lift curve to see whether you can correctly classify a large proportion of observations if you select only those with a fitted probability that exceeds a threshold value.
If the response has more than two levels, the Lift Curve plot displays a lift curve for each response level. For a given response level, this curve is the lift curve for correct classification into that level. See “Lift Curve” in Predictive and Specialized Modeling for more information about lift curves.
Decision Threshold
(Available only for binary categorical responses.) Shows or hides Decision Thresholds reports for the training, validation, and test sets, if specified. Each report contains a graph of the distribution of fitted probabilities for each model, confusion matrices for each model, and classification graphs to compare the model fits. See “Decision Thresholds Report” in Predictive and Specialized Modeling for more information about the Decision Thresholds report.
Plot Actual by Predicted
(Not available when the specified Distribution is Binomial, Multinomial, Ordinal Logistic, or Cox Proportional Hazards.) Plots actual Y values on the vertical axis and predicted Y values on the horizontal axis. If you used validation, a plot is shown for each of the Training, Validation, and Test sets.
Plot Residual by Predicted
(Not available when the specified Distribution is Binomial, Multinomial, Ordinal Logistic, or Cox Proportional Hazards.) Plots the residuals on the vertical axis and the predicted Y values on the horizontal axis. If you used validation, a plot is shown for each of the Training, Validation, and Test sets.
Plot Residual by Predictor
(Not available when the specified Distribution is Binomial, Multinomial, Ordinal Logistic, or Cox Proportional Hazards. Not available for models that contain a predictor that has the Vector modeling type.) For each predictor in the model, plots the residuals on the vertical axis and the predictor values on the horizontal axis. There is a plot for each of the predictors in the model. If you used validation, a set of plots is shown for each of the Training, Validation, and Test sets.
Normal Quantile Plot
(Available only when the specified Distribution is Normal and there is no censoring.) Shows or hides a plot of normal quantiles on the vertical axis and standardized residuals on the horizontal axis. If you used validation, a plot is shown for each of the Training, Validation, and Test sets.
Save Columns
Enables you to save columns based on the fitted model to the data table. See Save Columns Options for Cox Proportional Hazards Models for the options that are available if Cox Proportional Hazards is selected as the Distribution. For all other Distributions, the following columns can be saved to the data table:
Save Functional Prediction Formulas
(Available only when the response columns contain the FDE FPC Num column property.) Saves new columns to the original data table. A new column is added for each FDE principal component response. Each new contains a prediction formula for each functional principal component. A final column is added that contains a model prediction formula that is a linear combination of the prediction formulas and the eigenfunction columns from the Functional Data Explorer platform. A script is added to the data table. The script enables you to use the model prediction formula to profile the original response, which is specified in the FDE Output column property of the FDE principal component response columns. For more information about functional principal components, see “Functional Data Explorer” in Predictive and Specialized Modeling.
Note: The Save Functional Prediction Formulas option saves formula columns for all FDE principal component responses in the report window. If multiple models are fit for a single response, the final model for each response is used to create the prediction formula for that response.
Save Prediction Formula
Saves a new formula column to the original data table. The new column contains the prediction formula, given in terms of the observed (unstandardized) data values. The prediction formula does not contain zeroed terms. See Statistical Details for Distributions for mean formulas.
When the response column is categorical, this option creates a probability column for each response level as well as a column that contains the most likely response. The Most Likely response column contains the level with the highest probability based on the model. If the Probability Threshold is a value other than 0.5, this option creates an additional column that contains the most likely response based on the probability threshold value.
Mean Confidence Interval
Saves two new formula columns to the original data table. The new columns contain the lower and upper 95% confidence limits for the mean response.
Note: You can change the α level for the confidence interval in the Fit Model window by selecting Set Alpha Level from the Model Specification red triangle menu.
Std Error of Predicted
Saves a new column to the original data table. The new column contains the standard errors of the predicted mean response.
Std Error of Predicted Formula
Saves a new formula column to the original data table. The new column contains a formula for the standard errors of the predicted mean response.
Save Residual Formula
Saves a new formula column to the original data table. The new column contains a formula for the residuals, given in the form Y minus the prediction formula. The residual formula does not contain zeroed terms. Not available if Binomial is selected as the Distribution.
Save Variance Formula
Saves a new formula column to the original data table. The new column contains a formula for the variance of the prediction. The variance of the prediction is calculated using the formula for the variance of the selected Distribution. The value of the parameter involved in the link function is estimated by applying the inverse of the link function to the estimated linear component. Other parameters are replaced by their estimates. See Statistical Details for Distributions for variance formulas. Not available if Binomial is selected as the Distribution.
Save Linear Predictor
Saves a new formula column to the original data table. The new column contains a formula for the product of the design matrix and the vector of parameter estimates. This is commonly referred to as Xβ. The formula does not contain zeroed terms.
Save Validation Column
(Available only if the specified Validation Method is KFold, Holdback, or Leave-One-Out.) Saves a new column to the original data table. The new column describes the assignment of rows to folds. For KFold, the column lists the fold to which the row was assigned. For Holdback, each row is identified as belonging to the Training or Validation set. For Leave-One-Out, the row’s value indicates its order in being left out.
Note: If you selected a Validation column in the launch window, the Save Validation Column option does not appear.
Save Distribution Formula
(Not available when the specified Distribution is Binomial or Quantile Regression.) Saves a new formula column to the original data table. The new column contains a formula for the cumulative distribution function.
Save Survival Formula
(Available only when the specified Distribution is continuous.) Saves a new formula column to the original data table. The new column contains a formula for the probability of survival at the observed time. The survival function is equal to 1 minus the cumulative distribution function.
Save Simulation Formula
Saves a new formula column to the original data table. The new column contains a formula that generates simulated values using the estimated parameters for the model that you fit. This column can be used in the Simulate utility as a Column to Switch In. See “Simulate” in Basic Analysis.
Cook’s D Influence
(Available only if the specified Distribution is Normal and the specified Estimation Method is Standard Least Squares.) Saves a new column to the original data table. The new column contains the values for Cook’s D Influence statistic.
Hats
(Available only if the specified Distribution is Normal and the specified Estimation Method is Standard Least Squares.) Saves a new column to the original data table. The new column contains the diagonal elements of X(X‘X)-1X‘. These values are sometimes called hat values.
Publish Prediction Formula
Creates a prediction formula and saves it as a formula column script in the Formula Depot platform. If a Formula Depot report is not open, this option creates a Formula Depot report. See “Formula Depot” in Predictive and Specialized Modeling.
Save Survival Formula
Saves a new formula column to the original data table. The new column contains a formula for the probability of survival at the observed time.
Save Cox Snell Residual Formula
Saves a new formula column to the original data table. The new column contains a formula for the Cox-Snell residuals. The Cox-Snell residuals are strictly positive. See Meeker and Escobar (1998, sec. 17.6.1) for a discussion of Cox-Snell residuals.
Save Martingale Residual Formula
Saves a new formula column to the original data table. The new column contains a formula for the martingale residuals. The martingale residual is defined as the difference between the observed number of events for an individual and a conditionally expected number of events. The martingale residuals have a mean of zero and range between negative infinity and 1. See Fleming and Harrington (1991).
Save Linear Predictor
Saves a new formula column to the original data table. The new column contains a formula for the product of the design matrix and the vector of parameter estimates. This is commonly referred to as Xβ. The formula does not contain zeroed terms.
Remove Fit
Removes the specified model fit from the report.