Midterm 02 Review

Midterm 02 ReviewDr. D’Agostino McGowan1 / 20

Ridge Review

What are we minimizing with Ridge Regression?

2 / 20

Ridge Review

What are we minimizing with Ridge Regression?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

2 / 20

Ridge Review

What are we minimizing with Ridge Regression?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

What is the resulting estimate for ${\hat{β}}_{r i d g e}$ ?

2 / 20

Ridge Review

What are we minimizing with Ridge Regression?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

What is the resulting estimate for ${\hat{β}}_{r i d g e}$ ?

${\hat{β}}_{r i d g e} = (X^{T} X + λ I)^{- 1} X^{T} y$

2 / 20

Ridge Review

What are we minimizing with Ridge Regression?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

What is the resulting estimate for ${\hat{β}}_{r i d g e}$ ?

${\hat{β}}_{r i d g e} = (X^{T} X + λ I)^{- 1} X^{T} y$

Why is this useful?

2 / 20

05:00

`Derive the` $β_{r i d g e}$ `values`

Practice deriving the $β$ coefficients by minimizing $R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$ . Be sure you understand what each step is doing.

3 / 20

Answer

Start by FOILing the thing we're minimizing

(Don't remember how do to that? Check out Slide #8 from the Ridge Regression lecture)

4 / 20

Answer

Start by FOILing the thing we're minimizing

(Don't remember how do to that? Check out Slide #8 from the Ridge Regression lecture)

Take the derivative, set it equal to 0

(Don't remember how do to that? Check out Slide #10 from the Ridge Regression lecture)

4 / 20

Answer

Start by FOILing the thing we're minimizing

(Don't remember how do to that? Check out Slide #8 from the Ridge Regression lecture)

Take the derivative, set it equal to 0

(Don't remember how do to that? Check out Slide #10 from the Ridge Regression lecture)

Solve for $β$

(Don't remember how do to that? Check out Slide #17 from the Ridge Regression lecture)

4 / 20

Ridge Review

How is $λ$ determined?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

5 / 20

Ridge Review

How is $λ$ determined?

$R S S + λ \sum_{j = 1}^{p} β_{j}^{2}$

What is the bias-variance trade-off?

5 / 20

Dr. Lucy D'Agostino McGowan

Ridge RegressionPros
Can be used when p>np>n
Can be used to help with multicollinearity
Will decrease variance
(as λ→∞λ→∞ )

6 / 20

Dr. Lucy D'Agostino McGowan

Ridge RegressionPros
Can be used when p>np>n
Can be used to help with multicollinearity
Will decrease variance
(as λ→∞λ→∞ )

Cons
Will have increased bias (compared to least squares)
Does not really help with variable selection (all variables are included in some regard, even if their ββ coefficients are really small)

6 / 20

Dr. Lucy D'Agostino McGowan

Lasso!The lasso is similar to ridge, but it actually drives some ββ coefficients to 0! (So it helps with variable selection)
7 / 20

Lasso!

The lasso is similar to ridge, but it actually drives some $β$ coefficients to 0! (So it helps with variable selection)

$R S S + λ \sum_{j = 1}^{p} | β_{j} |$

7 / 20

Dr. Lucy D'Agostino McGowan

LassoPros
Can be used when p>np>n
Can be used to help with multicollinearity
Will decrease variance
(as λ→∞λ→∞ )
Can be used for variable selection, since it will make some ββ coefficients exactly 0

8 / 20

Dr. Lucy D'Agostino McGowan

LassoPros
Can be used when p>np>n
Can be used to help with multicollinearity
Will decrease variance
(as λ→∞λ→∞ )
Can be used for variable selection, since it will make some ββ coefficients exactly 0

Cons
Will have increased bias (compared to least squares)
If p>np>n the lasso can select at most nn variables

8 / 20

Dr. Lucy D'Agostino McGowan

What if we want to do both?Elastic net!
9 / 20

What if we want to do both?

Elastic net!

$R S S + λ_{1} \sum_{j = 1}^{p} β_{j}^{2} + λ_{2} \sum_{j = 1}^{p} | β_{j} |$

9 / 20

Elastic net

$R S S + λ_{1} \sum_{j = 1}^{p} β_{j}^{2} + λ_{2} \sum_{j = 1}^{p} | β_{j} |$

When will this be equivalent to Ridge Regression?

10 / 20

Elastic net

$R S S + λ_{1} \sum_{j = 1}^{p} β_{j}^{2} + λ_{2} \sum_{j = 1}^{p} | β_{j} |$

When will this be equivalent to Lasso?

11 / 20

Dr. Lucy D'Agostino McGowan

Nonlinear models12 / 20

03:00

`Polynomial Regression`

$p o p = β_{0} + β_{1} a g e + β_{2} a g e^{2} + β_{3} a g e^{3} + ϵ$

Using the information below, write out the equation to predict change in population from a change in age from the 25th percentile (24.5) to a 75th percentile (73.5).

term	estimate	std.error	statistic	p.value
(Intercept)	1807.8528	56.1241	32.2117	0.0000
age	-39.6783	4.9849	-7.9596	0.0000
I(age^2)	0.2064	0.1185	1.7414	0.0849
I(age^3)	0.0001	0.0008	0.1869	0.8522

13 / 20

03:00

`Nonlinear models`

What is the difference between:

Polynomial regression
Linear Spline
Cubic Spline
Natural Spline

14 / 20

Dr. Lucy D'Agostino McGowan

Degrees of freedom15 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} w e i g h t + ϵ$

16 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} w e i g h t + ϵ$

How many degrees of freedom are used for the horsepower variable?

16 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} w e i g h t + ϵ$

How many degrees of freedom are used for the horsepower variable?

16 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} h o r s e p o w e r^{2} + β_{3} w e i g h t + ϵ$

17 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} h o r s e p o w e r^{2} + β_{3} w e i g h t + ϵ$

How many degrees of freedom are used for the horsepower variable?

17 / 20

Example

A model predicting mpg from horsepower and weight.

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} h o r s e p o w e r^{2} + β_{3} w e i g h t + ϵ$

How many degrees of freedom are used for the horsepower variable?

17 / 20

Example

A model predicting mpg from horsepower and weight.

cubic spline with 3 knots

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} h o r s e p o w e r^{2} + β_{3} h o r s e p o w e r^{3} + β_{4} b_{4} (h o r s e p o w e r) + β_{5} b_{5} (h o r s e p o w e r) + β_{6} b_{6} (h o r s e p o w e r) + β_{7} w e i g h t + ϵ$

18 / 20

Example

A model predicting mpg from horsepower and weight.

cubic spline with 3 knots

How many degrees of freedom are used for the horsepower variable?

18 / 20

Example

A model predicting mpg from horsepower and weight.

cubic spline with 3 knots

How many degrees of freedom are used for the horsepower variable?

18 / 20

Example

A model predicting mpg from horsepower and weight.

cubic spline with 3 knots

How many degrees of freedom are used for the horsepower variable?

Don't remember what those $b_{i} ()$ are? Review Non-linear Slide #17

18 / 20

Example

A model predicting mpg from horsepower and weight.

natural cubic spline with 3 knots

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} r_{2} (h o r s e p o w e r) + β_{3} r_{3} (h o r s e p o w e r) + β_{4} w e i g h t + ϵ$

$r_{i} (h o r s e p o w e r)$ is a function of $h o r e s p o w e r$ similar to $b_{i} ()$ from the cubic spline, but slightly different due to the restriction, you don't need to know this specification

19 / 20

Example

A model predicting mpg from horsepower and weight.

natural cubic spline with 3 knots

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} r_{2} (h o r s e p o w e r) + β_{3} r_{3} (h o r s e p o w e r) + β_{4} w e i g h t + ϵ$

$r_{i} (h o r s e p o w e r)$ is a function of $h o r e s p o w e r$ similar to $b_{i} ()$ from the cubic spline, but slightly different due to the restriction, you don't need to know this specification

How many degrees of freedom are used for the horsepower variable?

19 / 20

Example

A model predicting mpg from horsepower and weight.

natural cubic spline with 3 knots

$m p g = β_{0} + β_{1} h o r s e p o w e r + β_{2} r_{2} (h o r s e p o w e r) + β_{3} r_{3} (h o r s e p o w e r) + β_{4} w e i g h t + ϵ$

$r_{i} (h o r s e p o w e r)$ is a function of $h o r e s p o w e r$ similar to $b_{i} ()$ from the cubic spline, but slightly different due to the restriction, you don't need to know this specification

How many degrees of freedom are used for the horsepower variable?

19 / 20

Dr. Lucy D'Agostino McGowan

Other things to reviewMake sure you are familiar with how tidymodels works
Remember your matrix facts
20 / 20

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Midterm 02 Review

Dr. D’Agostino McGowan

Ridge Review

Ridge Review

Ridge Review

Ridge Review

Ridge Review

Derive the βridgeβridge values

Answer

Answer

Answer

Ridge Review

Ridge Review

Ridge Regression

Pros

Ridge Regression

Pros

Cons

Lasso!

Lasso!

Lasso

Pros

Lasso

Pros

Cons

What if we want to do both?

What if we want to do both?

Elastic net

Elastic net

Nonlinear models

Polynomial Regression

Nonlinear models

Degrees of freedom

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Example

Other things to review

Ridge Review

Help

`Derive the` $β_{r i d g e}$ `values`

`Polynomial Regression`

`Nonlinear models`