Estimation Method Options

(Available only when the specified Distribution is Normal and the No Intercept option is not selected.) Computes parameter estimates by applying an l1 penalty using a linear programming approach. See Candes and Tao (2007). The Dantzig Selector is useful for analyzing the results of designed experiments. For orthogonal problems, the Dantzig Selector and Lasso give identical results. For more information, see Dantzig Selector.

Lasso

Computes parameter estimates by applying an l1 penalty. Due to the l1 penalty, some coefficients can be estimated as zero. Thus, variable selection is performed as part of the fitting procedure. In the ordinary Lasso, all coefficients are equally penalized.

Adaptive Lasso

Computes parameter estimates by penalizing a weighted sum of the absolute values of the regression coefficients. The weights in the l1 penalty are determined by the data in such as way as to guarantee the oracle property (Zou 2006). This option uses the MLEs to weight the l1 penalty. MLEs cannot be computed when the number of predictors exceeds the number of observations or when there are strict linear dependencies among the predictors. If MLEs for the regression parameters cannot be computed, a generalized inverse solution or a ridge solution is used for the l1 penalty weights. See Adaptive Methods.

The Lasso and the adaptive Lasso options generally choose parsimonious models when predictors are highly correlated. These techniques tend to select only one of a group of correlated predictors. High-dimensional data tend to have highly correlated predictors. For this type of data, the Elastic Net might be a better choice than the Lasso. For more information, see Lasso Regression.

Elastic Net

Computes parameter estimates by applying both an l1 penalty and an l2 penalty. The l1 penalty ensures that variable selection is performed. The l2 penalty improves predictive ability by shrinking the coefficients as ridge does.

Adaptive Elastic Net

Computes parameter estimates using an adaptive l1 penalty as well as an l2 penalty. This option uses the MLEs to weight the l1 penalty. MLEs cannot be computed when the number of predictors exceeds the number of observations or when there are strict linear dependencies among the predictors. If MLEs for the regression parameters cannot be computed, a generalized inverse solution or a ridge solution is used for the l1 penalty weights. You can set a value for the Elastic Net Alpha in the Advanced Controls panel. See Adaptive Methods.

The Elastic Net tends to provide better prediction accuracy than the Lasso when predictors are highly correlated. (In fact, both Ridge and the Lasso are special cases of the Elastic Net.) In terms of predictive ability, the adaptive Elastic Net often outperforms both the Elastic Net and the adaptive Lasso. The Elastic Net has the ability to select groups of correlated predictors and to assign appropriate parameter estimates to the predictors involved. For more information, see Elastic Net.

Note: If you select an Elastic Net fit and set the Elastic Net Alpha to missing, the algorithm computes the Lasso, Elastic Net, and Ridge fits, in that order. If a fit is time intensive, a progress bar appears. When you click Accept Current Estimates, the calculation stops and the reported parameter estimates correspond to the best model fit at that point. The progress bar indicates when the algorithm is fitting Lasso, Elastic Net, and Ridge. You can use this information to decide when to click Accept Current Estimates.

Ridge

Computes parameter estimates using ridge regression. Ridge regression is a biased regression technique that applies an l2 penalty and does not result in zero parameter estimates. It is useful when you want to retain all predictors in your model. For more details, see Ridge Regression.

Double Lasso

Computes parameter estimates in two stages. In the first stage, a Lasso model is fit to determine the terms to be used in the second stage. In the second stage, a Lasso model is fit using the terms from the first stage. The Solution Path results and the parameter estimate reports that appear are for the second-stage fit. If none of the variables enters the model in the first stage, there is no second stage, and the results of the first stage appear in the report.

The Double Lasso is especially useful when the number of observations is less than the number of predictors. By breaking the variable selection and shrinkage operations into two stages, the Lasso in the second stage is less likely to overly penalize the terms that should be included in the model. The double lasso is similar to the relaxed lasso, which is described in Hastie et al. (2009, p. 91).

Adaptive Double Lasso

Computes parameter estimates in two stages. In the first stage, an adaptive Lasso model is fit to determine the terms to be used in the second stage. In the second stage, an adaptive Lasso model is fit using the terms from the first stage. The second stage considers only the terms that are included in the first stage model and uses weights based on the parameter estimates in the first stage. You can choose the method of calculating the weights using the Adaptive Penalty Weights option in the Advanced Controls. See Advanced Control Options. The results that are shown are for the second-stage fit. If none of the variables enters the model in the first stage, there is no second stage, and the results of the first stage appear in the report. See Adaptive Methods.