Ridge regression, LASSO

When computing models that include many variables, collinearity is often a problem. One of the symptoms is that the regression coefficients may be very large, and the associated standard errors are very large as well. This means that the coefficients are not well defined.

Various ways have been devised to address this problem. One of the most successful is regularization. Regularization limits the problems associated with collinearity by minimizing the sum of the squares of the residuals combined with a penalty term that measures the magnitude of the coefficients. Large coefficients are penalized, but the overall results can be more reliable.

Different forms of the penalty term lead to different methods. If the penalty term is quadratic, then the method is called ridge regression. If the penalty term is a sum of the absolute values of the coefficients, it is called LASSO (Least Absolute Shrinkage and Selection Operator). If it is a combination of the two, it is called elastic net. Each method has its advantages and disadvantages.

In general, standard errors and confidence intervals for the regression coefficients are not available for regularized regression. The reason is that the regularization introduces a bias towards smaller values, which makes the true variance of the coefficient hard to determine.

Computing regularized regression

Regularization is implemented in two ways. Ridge regression is implemented as an option to the LinearRegressionModel class. LASSO and elastic net are implemented using a separate class, RegularizedRegressionModel.

Ridge regression

Ridge regression can be computed like ordinary linear regression by setting the RidgeParameter property to a strictly positive value. The value of the parameter is used as the coefficient of the quadratic term that is added to the sum of the squared residuals. By default, the predictors are standardized to have zero mean and unit standard deviation. The ridge parameter's size should therefore be compared to unity, not to the scale of the predictors.

LASSO

LASSO and elastic net are implemented using the RegularizedRegressionModel. The RegularizationParameter should be set to the coefficient of the penalty term. A second parameter, RegularizationRatio determines the relative importance of the linear and quadratic penalty terms. For LASSO, it should be set to or kept at its default value of one. The value of the regularization parameter is used as the coefficient of the sum of absolute values of the coefficients that is added to the sum of the squared residuals.

Elastic net

Elastic net is a generalization of both ridge regression and the LASSO which includes both a linear and a quadratic term in the penalty. Once again, the RegularizationParameter property should be set a strictly positive value. The RegularizationRatio should be set to a value between 0 and 1. It specifies the fraction of the penalty term that is linear. A value of 0.4 means that the linear term (sum of the absolute values of the coefficients) has a coefficient of 0.4 times RegularizationParameter, while the quadratic term will have a coefficient of 0.6 times RegularizationParameter.

Regularization paths

For LASSO and elastic net, it is possible to obtain the regularization path. The regularization path shows how the values of the regression coefficients change as the regularization parameter changes. Above a certain value, all regression coefficients will be zero. The regularization path is only interesting up to this value.

The regularization path can be obtained in two steps: once the regularization ratio is set to its desired value, a call to GetRegularizationPathParameters returns a vector of suitable regularization parameters. This method takes two arguments: the number of points and the ratio between the smallest and the largest value. The largest value is chosen automatically to approximate the smallest value that produces all zero coefficients.

A call to GetRegularizationPathParameters will then compute the regularization path. It takes the vector returned from GetRegularizationPathParameters as its first argument and returns a matrix whose rows contain the regression coefficients computed for the corresponding value of the regularization parameter.