Loading [MathJax]/jax/output/CommonHTML/jax.js
+ - 0:00:00
Notes for current slide
Notes for next slide

Lasso and Elastic Net

Dr. D’Agostino McGowan

1 / 12

Ridge Review

What are we minimizing with Ridge Regression?

2 / 12

Ridge Review

What are we minimizing with Ridge Regression?

RSS+λpj=1β2j

2 / 12

Ridge Review

What are we minimizing with Ridge Regression?

RSS+λpj=1β2j

What is the resulting estimate for ˆβridge?

2 / 12

Ridge Review

What are we minimizing with Ridge Regression?

RSS+λpj=1β2j

What is the resulting estimate for ˆβridge?

ˆβridge=(XTX+λI)1XTy

2 / 12

Ridge Review

What are we minimizing with Ridge Regression?

RSS+λpj=1β2j

What is the resulting estimate for ˆβridge?

ˆβridge=(XTX+λI)1XTy

Why is this useful?

2 / 12

Ridge Review

How is λ determined?

RSS+λpj=1β2j

3 / 12

Ridge Review

How is λ determined?

RSS+λpj=1β2j

What is the bias-variance trade-off?

3 / 12

Ridge Regression

Pros

  • Can be used when p>n
  • Can be used to help with multicollinearity
  • Will decrease variance (as λ )
4 / 12

Ridge Regression

Pros

  • Can be used when p>n
  • Can be used to help with multicollinearity
  • Will decrease variance (as λ )

Cons

  • Will have increased bias (compared to least squares)
  • Does not really help with variable selection (all variables are included in some regard, even if their β coefficients are really small)
4 / 12

Lasso!

  • The lasso is similar to ridge, but it actually drives some β coefficients to 0! (So it helps with variable selection)
5 / 12

Lasso!

  • The lasso is similar to ridge, but it actually drives some β coefficients to 0! (So it helps with variable selection)

RSS+λpj=1|βj|

5 / 12

Lasso!

  • The lasso is similar to ridge, but it actually drives some β coefficients to 0! (So it helps with variable selection)

RSS+λpj=1|βj|

  • We say lasso uses an 1 penalty, ridge uses an 2 penalty
5 / 12

Lasso!

  • The lasso is similar to ridge, but it actually drives some β coefficients to 0! (So it helps with variable selection)

RSS+λpj=1|βj|

  • We say lasso uses an 1 penalty, ridge uses an 2 penalty
  • ||β||1=|βj|
  • ||β||2=β2j
5 / 12

Lasso

  • Like Ridge regression, lasso shrinks the coefficients towards 0
6 / 12

Lasso

  • Like Ridge regression, lasso shrinks the coefficients towards 0
  • In lasso, the 1 penalty forces some of the coefficient estimates to be exactly zero when the tuning parameter λ is sufficiently large
6 / 12

Lasso

  • Like Ridge regression, lasso shrinks the coefficients towards 0
  • In lasso, the 1 penalty forces some of the coefficient estimates to be exactly zero when the tuning parameter λ is sufficiently large
  • Therefore, lasso can be used for variable selection
6 / 12

Lasso

  • Like Ridge regression, lasso shrinks the coefficients towards 0
  • In lasso, the 1 penalty forces some of the coefficient estimates to be exactly zero when the tuning parameter λ is sufficiently large
  • Therefore, lasso can be used for variable selection
  • The lasso can help create smaller, simplier models
6 / 12

Lasso

  • Like Ridge regression, lasso shrinks the coefficients towards 0
  • In lasso, the 1 penalty forces some of the coefficient estimates to be exactly zero when the tuning parameter λ is sufficiently large
  • Therefore, lasso can be used for variable selection
  • The lasso can help create smaller, simplier models
  • Choosing λ again is done via cross-validation
6 / 12

Lasso

Pros

  • Can be used when p>n
  • Can be used to help with multicollinearity
  • Will decrease variance (as λ )
  • Can be used for variable selection, since it will make some β coefficients exactly 0
7 / 12

Lasso

Pros

  • Can be used when p>n
  • Can be used to help with multicollinearity
  • Will decrease variance (as λ )
  • Can be used for variable selection, since it will make some β coefficients exactly 0

Cons

  • Will have increased bias (compared to least squares)
  • If p>n the lasso can select at most n variables
7 / 12

Ridge versus lasso

  • Neither Ridge nor lasso will universally dominate
8 / 12

Ridge versus lasso

  • Neither Ridge nor lasso will universally dominate
  • Cross-validation can also be used to determine which method (Ridge or lasso) should be used
8 / 12

Ridge versus lasso

  • Neither Ridge nor lasso will universally dominate
  • Cross-validation can also be used to determine which method (Ridge or lasso) should be used
  • Cross-validation is also used to select λ in either method. You choose the λ value for which the cross-validation model is the smallest
8 / 12

What if we want to do both?

  • Elastic net!
9 / 12

What if we want to do both?

  • Elastic net!

RSS+λ1pj=1β2j+λ2pj=1|βj|

9 / 12

What if we want to do both?

  • Elastic net!

RSS+λ1pj=1β2j+λ2pj=1|βj|

What is the 1 part of the penalty?

9 / 12

What if we want to do both?

  • Elastic net!

RSS+λ1pj=1β2j+λ2pj=1|βj|

What is the 1 part of the penalty?

What is the 2 part of the penalty

9 / 12

Elastic net

RSS+λ1pj=1β2j+λ2pj=1|βj|

When will this be equivalent to Ridge Regression?

10 / 12

Elastic net

RSS+λ1pj=1β2j+λ2pj=1|βj|

When will this be equivalent to Lasso?

11 / 12

Elastic Net

RSS+λ1pj=1β2j+λ2pj=1|βj|

  • The 1 part of the penalty will generate a sparse model (shrink some β coefficients to exactly 0)
12 / 12

Elastic Net

RSS+λ1pj=1β2j+λ2pj=1|βj|

  • The 1 part of the penalty will generate a sparse model (shrink some β coefficients to exactly 0)
  • The 2 part of the penalty removes the limitation on the number of variables selected (can be >n now)
12 / 12

Elastic Net

RSS+λ1pj=1β2j+λ2pj=1|βj|

  • The 1 part of the penalty will generate a sparse model (shrink some β coefficients to exactly 0)
  • The 2 part of the penalty removes the limitation on the number of variables selected (can be >n now)

    How do you think λ1 and λ2 are chosen?

12 / 12

Ridge Review

What are we minimizing with Ridge Regression?

2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow