Ridge Regression

Ridge RegressionDr. D’Agostino McGowan1 / 47

📖 Canvas

2 / 47

Linear Regression Review

In linear regression, what are we minimizing? How can I write this in matrix form?

3 / 47

Linear Regression Review

In linear regression, what are we minimizing? How can I write this in matrix form?

RSS!

$(y - X \hat{β})^{T} (y - X \hat{β})$

3 / 47

Linear Regression Review

In linear regression, what are we minimizing? How can I write this in matrix form?

RSS!

$(y - X \hat{β})^{T} (y - X \hat{β})$

What is the solution ( $\hat{β}$ ) to this?

3 / 47

Linear Regression Review

In linear regression, what are we minimizing? How can I write this in matrix form?

RSS!

$(y - X \hat{β})^{T} (y - X \hat{β})$

What is the solution ( $\hat{β}$ ) to this?

${(X}^{T} X)^{- 1} X^{T} y$

3 / 47

Linear Regression Review

What is $X$ ?

4 / 47

Linear Regression Review

What is $X$ ?

the design matrix!

4 / 47

`Matrix fact`

$\begin{aligned} C & = A B \\ C^{T} & = B^{T} A^{T} \end{aligned}$

5 / 47

`Matrix fact`

$\begin{aligned} C & = A B \\ C^{T} & = B^{T} A^{T} \end{aligned}$

`Try it!`

Distribute (FOIL / get rid of the parentheses) the RSS equation

$R S S = (y - X \hat{β})^{T} (y - X \hat{β})$

02:00

5 / 47

`Matrix fact`

$\begin{aligned} C & = A B \\ C^{T} & = B^{T} A^{T} \end{aligned}$

`Try it!`

Distribute (FOIL / get rid of the parentheses) the RSS equation

$\begin{aligned} R S S & = (y - X \hat{β})^{T} (y - X \hat{β}) \\ = y^{T} y - {\hat{β}}^{T} X^{T} y - y^{T} X \hat{β} + {\hat{β}}^{T} X^{T} X \hat{β} \end{aligned}$

6 / 47

Dr. Lucy D'Agostino McGowan

 Matrix factthe transpose of a scalar is a scalar
7 / 47

`Matrix fact`

the transpose of a scalar is a scalar
${\hat{β}}^{T} X^{T} y$ is a scalar

Why? What are the dimensions of ${\hat{β}}^{T}$ ? What are the dimensions of $X$ ? What are the dimensions of $y$ ?

7 / 47

`Matrix fact`

the transpose of a scalar is a scalar
${\hat{β}}^{T} X^{T} y$ is a scalar

Why? What are the dimensions of ${\hat{β}}^{T}$ ? What are the dimensions of $X$ ? What are the dimensions of $y$ ?

$(y^{T} X \hat{β})^{T} = {\hat{β}}^{T} X^{T} y$

7 / 47

`Matrix fact`

the transpose of a scalar is a scalar
${\hat{β}}^{T} X^{T} y$ is a scalar

Why? What are the dimensions of ${\hat{β}}^{T}$ ? What are the dimensions of $X$ ? What are the dimensions of $y$ ?

$(y^{T} X \hat{β})^{T} = {\hat{β}}^{T} X^{T} y$

$\begin{aligned} R S S & = (y - X \hat{β})^{T} (y - X \hat{β}) \\ = y^{T} y - {\hat{β}}^{T} X^{T} y - y^{T} X \hat{β} + {\hat{β}}^{T} X^{T} X \hat{β} \\ = y^{T} y - 2 {\hat{β}}^{T} X^{T} y + {\hat{β}}^{T} X^{T} X \hat{β} \end{aligned}$

7 / 47

Linear Regression Review

To find the $\hat{β}$ that is going to minimize this RSS, what do we do? Why?

8 / 47

`Matrix fact`

When $a$ and $b$ are $p \times 1$ vectors

$\frac{\partial a^{T} b}{\partial b} = \frac{\partial b^{T} a}{\partial b} = a$

9 / 47

`Matrix fact`

When $a$ and $b$ are $p \times 1$ vectors

$\frac{\partial a^{T} b}{\partial b} = \frac{\partial b^{T} a}{\partial b} = a$

When $A$ is a symmetric matrix

$\frac{\partial b^{T} A b}{\partial b} = 2 A b = 2 b^{T} A$

9 / 47

`Matrix fact`

When $a$ and $b$ are $p \times 1$ vectors

$\frac{\partial a^{T} b}{\partial b} = \frac{\partial b^{T} a}{\partial b} = a$

When $A$ is a symmetric matrix

$\frac{\partial b^{T} A b}{\partial b} = 2 A b = 2 b^{T} A$

`Try it!`

$\frac{\partial R S S}{\partial \hat{β}} =$

$R S S = y^{T} y - 2 {\hat{β}}^{T} X^{T} y + {\hat{β}}^{T} X^{T} X \hat{β}$

02:00

9 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

$R S S = y^{T} y - 2 {\hat{β}}^{T} X^{T} y + {\hat{β}}^{T} X^{T} X \hat{β}$

$\frac{\partial R S S}{\partial \hat{β}} = - 2 X^{T} y + 2 X^{T} X \hat{β} = 0$

10 / 47

`Matrix fact`

$A A^{- 1} = I$

11 / 47

`Matrix fact`

$A A^{- 1} = I$

What is $I$ ?

11 / 47

`Matrix fact`

$A A^{- 1} = I$

What is $I$ ?

identity matrix

$I = [\begin{matrix} 1 & 0 & \dots & 0 \\ 0 & 1 & \dots & 0 \\ ⋮ & ⋮ & ⋱ & ⋮ \\ 0 & 0 & \dots & 1 \end{matrix}]$

$A I = A$

11 / 47

`Try it!`

Solve for $\hat{β}$

$- 2 X^{T} y + 2 X^{T} X \hat{β} = 0$

02:00

12 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

$\begin{aligned} - 2 X^{T} y + 2 X^{T} X \hat{β} & = 0 \\ 2 X^{T} X \hat{β} & = 2 X^{T} y \\ X^{T} X \hat{β} & = X^{T} y \end{aligned}$

13 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

$\begin{aligned} - 2 X^{T} y + 2 X^{T} X \hat{β} & = 0 \\ 2 X^{T} X \hat{β} & = 2 X^{T} y \\ X^{T} X \hat{β} & = X^{T} y \\ (X^{T} X)^{- 1} X^{T} X \hat{β} & = (X^{T} X)^{- 1} X^{T} y \end{aligned}$

14 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

$\begin{aligned} - 2 X^{T} y + 2 X^{T} X \hat{β} & = 0 \\ 2 X^{T} X \hat{β} & = 2 X^{T} y \\ X^{T} X \hat{β} & = X^{T} y \\ (X^{T} X)^{- 1} X^{T} X \hat{β} & = (X^{T} X)^{- 1} X^{T} y \\ \underset{I}{\underset{⏟}{(X^{T} X)^{- 1} X^{T} X}} \hat{β} & = (X^{T} X)^{- 1} X^{T} y \end{aligned}$

15 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

16 / 47

Linear Regression Review

How did we get ${(X}^{T} X)^{- 1} X^{T} y$ ?

17 / 47

Linear Regression Review

Let's try to find an $X$ for which it would be impossible to calculate $\hat{β}$

18 / 47

Dr. Lucy D'Agostino McGowan

 Ridge
- Go to RStudio Pro
  - rstudio.hpc.ar53.wfu.edu:8787
  - pw: R2D2Star!


05:00
19 / 47

Estimating $\hat{β}$

$\hat{β} = {(X}^{T} X)^{- 1} X^{T} y$

Under what circumstances is this equation not estimable?

20 / 47

Estimating $\hat{β}$

$\hat{β} = {(X}^{T} X)^{- 1} X^{T} y$

Under what circumstances is this equation not estimable?

when we can't invert $(X^{T} X)^{- 1}$

20 / 47

Estimating $\hat{β}$

$\hat{β} = {(X}^{T} X)^{- 1} X^{T} y$

Under what circumstances is this equation not estimable?

when we can't invert $(X^{T} X)^{- 1}$
- $p > n$
- multicollinearity

20 / 47

Estimating $\hat{β}$

$\hat{β} = {(X}^{T} X)^{- 1} X^{T} y$

Under what circumstances is this equation not estimable?

when we can't invert $(X^{T} X)^{- 1}$
- $p > n$
- multicollinearity

A guaranteed way to check whether a square matrix is not invertible is to check whether the determinant is equal to zero

20 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 2 & 3 & 1 \\ 1 & 3 & 4 & 0 \end{matrix}]$

What is $n$ here? What is $p$ ?

21 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 2 & 3 & 1 \\ 1 & 3 & 4 & 0 \end{matrix}]$

What is $n$ here? What is $p$ ?

Is $(X^{T} X)^{- 1}$ going to be invertible?

21 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 2 & 3 & 1 \\ 1 & 3 & 4 & 0 \end{matrix}]$

What is $n$ here? What is $p$ ?

Is $(X^{T} X)^{- 1}$ going to be invertible?

X <- matrix(c(1, 1, 2, 3, 3, 4, 1, 0), nrow = 2)
det(t(X) %*% X)

## [1] 0

21 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

22 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

22 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

X <- matrix(c(1, 1, 1, 1, 3, 4, 5, 2, 6, 8, 10, 4), nrow = 4)
det(t(X) %*% X)

## [1] 0

cor(X[, 2], X[, 3])

## [1] 1

22 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

What was the problem this time?

X <- matrix(c(1, 1, 1, 1, 3, 4, 5, 2, 6, 8, 10, 4), nrow = 4)
det(t(X) %*% X)

## [1] 0

cor(X[, 2], X[, 3])

## [1] 1

23 / 47

Estimating $\hat{β}$

What is a sure-fire way to tell whether $(X^{T} X)^{- 1}$ will be invertible?

24 / 47

Estimating $\hat{β}$

What is a sure-fire way to tell whether $(X^{T} X)^{- 1}$ will be invertible?

Take the determinant!

24 / 47

Estimating $\hat{β}$

What is a sure-fire way to tell whether $(X^{T} X)^{- 1}$ will be invertible?

Take the determinant!

$| A |$ means the determinant of matrix $A$

24 / 47

Estimating $\hat{β}$

What is a sure-fire way to tell whether $(X^{T} X)^{- 1}$ will be invertible?

Take the determinant!

$| A |$ means the determinant of matrix $A$

For a 2x2 matrix:

$A = [\begin{matrix} a & b \\ c & d \end{matrix}]$ $| A | = a d - b c$

24 / 47

Estimating $\hat{β}$

What is a sure-fire way to tell whether $(X^{T} X)^{- 1}$ will be invertible?

Take the determinant!

$| A |$ means the determinant of matrix $A$

For a 3x3 matrix:

$A = [\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}]$ $| A | = a (e i - f h) - b (d i - f g) + c (d h - e g)$

25 / 47

Determinants

It looks funky, but it follows a nice pattern!

$A = [\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}]$ $| A | = a (e i - f h) - b (d i - f g) + c (d h - e g)$

26 / 47

Determinants

It looks funky, but it follows a nice pattern!

$A = [\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}]$ $| A | = a (e i - f h) - b (d i - f g) + c (d h - e g)$

(1) multiply $a$ by the determinant of the portion of the matrix that are not in $a$ 's row or column
do the same for $b$ (2) and $c$ (3)
put it together as plus (1) minus (2) plus (3)

26 / 47

Determinants

It looks funky, but it follows a nice pattern!

$A = [\begin{matrix} a & b & c \\ d & e & f \\ g & h & i \end{matrix}]$ $| A | = a (e i - f h) - b (d i - f g) + c (d h - e g)$

(1) multiply $a$ by the determinant of the portion of the matrix that are not in $a$ 's row or column
do the same for $b$ (2) and $c$ (3)
put it together as plus (1) minus (2) plus (3)

$| A | = a | \begin{matrix} e & f \\ h & i \end{matrix} | - b | \begin{matrix} d & f \\ g & i \end{matrix} | + c | \begin{matrix} d & e \\ g & h \end{matrix} |$

26 / 47

`Determinants`

Calculate the determinant of the following matrices in R using the det() function:

$A = [\begin{matrix} 1 & 2 \\ 4 & 5 \end{matrix}]$

$B = [\begin{matrix} 1 & 2 & 3 \\ 3 & 6 & 9 \\ 2 & 5 & 7 \end{matrix}]$

Are these both invertible?

01:00

27 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

28 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

28 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

X <- matrix(c(1, 1, 1, 1, 3.01, 4, 5, 2, 6, 8, 10, 4), nrow = 4)
det(t(X) %*% X)

## [1] 0.0056

cor(X[, 2], X[, 3])

## [1] 0.999993

28 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

y <- c(1, 2, 3, 2)
solve(t(X) %*% X) %*% t(X) %*% y

##             [,1]
## [1,]    1.285714
## [2,] -114.285714
## [3,]   57.285714

29 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

Is $(X^{T} X)^{- 1}$ going to be invertible?

$[\begin{matrix} {\hat{β}}_{0} \\ {\hat{β}}_{1} \\ {\hat{β}}_{2} \end{matrix}] = [\begin{matrix} 1.28 \\ - 114.29 \\ 57.29 \end{matrix}]$

30 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

What is the equation for the variance of $\hat{β}$ ?

$v a r (\hat{β}) = σ^{2} (X^{T} X)^{- 1}$

31 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

What is the equation for the variance of $\hat{β}$ ?

$v a r (\hat{β}) = σ^{2} (X^{T} X)^{- 1}$

${\hat{σ}}^{2} = \frac{R S S}{n - p - 1}$

31 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

What is the equation for the variance of $\hat{β}$ ?

$v a r (\hat{β}) = σ^{2} (X^{T} X)^{- 1}$

${\hat{σ}}^{2} = \frac{R S S}{n - p - 1}$

$v a r (\hat{β}) = [\begin{matrix} 0.91835 & - 24.489 & 12.132 \\ - 24.48943 & 4081.571 & - 2038.745 \\ 12.13247 & - 2038.745 & 1018.367 \end{matrix}]$

31 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

$v a r (\hat{β}) = [\begin{matrix} 0.91835 & - 24.489 & 12.132 \\ - 24.48943 & 4081.571 & - 2038.745 \\ 12.13247 & - 2038.745 & 1018.367 \end{matrix}]$

What is the variance for ${\hat{β}}_{0}$ ?

32 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

$v a r (\hat{β}) = [\begin{matrix} 0.91835 & - 24.489 & 12.132 \\ - 24.48943 & 4081.571 & - 2038.745 \\ 12.13247 & - 2038.745 & 1018.367 \end{matrix}]$

What is the variance for ${\hat{β}}_{0}$ ?

33 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

$v a r (\hat{β}) = [\begin{matrix} 0.91835 & - 24.489 & 12.132 \\ - 24.48943 & 4081.571 & - 2038.745 \\ 12.13247 & - 2038.745 & 1018.367 \end{matrix}]$

What is the variance for ${\hat{β}}_{1}$ ?

34 / 47

Estimating $\hat{β}$

$X = [\begin{matrix} 1 & 3.01 & 6 \\ 1 & 4 & 8 \\ 1 & 5 & 10 \\ 1 & 2 & 4 \end{matrix}]$

$v a r (\hat{β}) = [\begin{matrix} 0.91835 & - 24.489 & 12.132 \\ - 24.48943 & 4081.571 & - 2038.745 \\ 12.13247 & - 2038.745 & 1018.367 \end{matrix}]$

What is the variance for ${\hat{β}}_{1}$ ? 😱

35 / 47

What's the problem?

Sometimes we can't solve for $\hat{β}$

Why?

36 / 47

Dr. Lucy D'Agostino McGowan

What's the problem?Sometimes we can't solve for ^ββ^XTXXTX is not invertible

37 / 47

Dr. Lucy D'Agostino McGowan

What's the problem?Sometimes we can't solve for ^ββ^XTXXTX is not invertible
We have more variables than observations ( p>np>n )
The variables are linear combinations of one another

37 / 47

Dr. Lucy D'Agostino McGowan

What's the problem?Sometimes we can't solve for ^ββ^XTXXTX is not invertible
We have more variables than observations ( p>np>n )
The variables are linear combinations of one another

Even when we can invert XTXXTX, things can go wrong
37 / 47

Dr. Lucy D'Agostino McGowan

What's the problem?Sometimes we can't solve for ^ββ^XTXXTX is not invertible
We have more variables than observations ( p>np>n )
The variables are linear combinations of one another

Even when we can invert XTXXTX, things can go wrong  The variance can blow up, like we just saw!

37 / 47

Dr. Lucy D'Agostino McGowan

What can we do about this?38 / 47

Dr. Lucy D'Agostino McGowan

Ridge RegressionWhat if we add an additional penalty to keep the ^ββ^ coefficients small (this will keep the variance from blowing up!)
39 / 47

Dr. Lucy D'Agostino McGowan

Ridge RegressionWhat if we add an additional penalty to keep the ^ββ^ coefficients small (this will keep the variance from blowing up!)
Instead of minimizing RSSRSS, like we do with linear regresion, let's minimize RSSRSS PLUS some penalty function
39 / 47

Ridge Regression

What if we add an additional penalty to keep the $\hat{β}$ coefficients small (this will keep the variance from blowing up!)
Instead of minimizing $R S S$ , like we do with linear regresion, let's minimize $R S S$ PLUS some penalty function

$R S S + \underset{shrinkage penalty}{\underset{⏟}{λ \sum_{j = 1}^{p} β_{j}^{2}}}$

39 / 47

Ridge Regression

What if we add an additional penalty to keep the $\hat{β}$ coefficients small (this will keep the variance from blowing up!)
Instead of minimizing $R S S$ , like we do with linear regresion, let's minimize $R S S$ PLUS some penalty function

$R S S + \underset{shrinkage penalty}{\underset{⏟}{λ \sum_{j = 1}^{p} β_{j}^{2}}}$

What happens when $λ = 0$ ? What happens as $λ \to \infty$ ?

39 / 47

Ridge Regression

Let's solve for the $\hat{β}$ coefficients using Ridge Regression. What are we minimizing?

40 / 47

Ridge Regression

Let's solve for the $\hat{β}$ coefficients using Ridge Regression. What are we minimizing?

$(y - X β)^{T} (y - X β) + λ β^{T} β$

40 / 47

`Try it!`

Find $\hat{β}$ that minimizes this:

$(y - X β)^{T} (y - X β) + λ β^{T} β$

02:00

41 / 47

Ridge Regression

${\hat{β}}_{r i d g e} = (X^{T} X + λ I)^{- 1} X^{T} y$

42 / 47

Ridge Regression

${\hat{β}}_{r i d g e} = (X^{T} X + λ I)^{- 1} X^{T} y$

Not only does this help with the variance, it solves our problem when $X^{T} X$ isn't invertible!

42 / 47

Dr. Lucy D'Agostino McGowan

Choosing λλλλ is known as a tuning parameter and is selected using cross validation
For example, choose the λλ that results in the smallest estimated test error
43 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

44 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

As $λ$ ☝️, bias ☝️, variance 👇

44 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

As $λ$ ☝️, bias ☝️, variance 👇
Bias( ${\hat{β}}_{r i d g e}$ ) = $- λ (X^{T} X + λ I)^{- 1} β$

44 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

As $λ$ ☝️, bias ☝️, variance 👇
Bias( ${\hat{β}}_{r i d g e}$ ) = $- λ (X^{T} X + λ I)^{- 1} β$
What would this be if $λ$ was 0?

44 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

As $λ$ ☝️, bias ☝️, variance 👇
Bias( ${\hat{β}}_{r i d g e}$ ) = $- λ (X^{T} X + λ I)^{- 1} β$
What would this be if $λ$ was 0?
Var( ${\hat{β}}_{r i d g e}$ ) = $σ^{2} (X^{T} X + λ I)^{- 1} X^{T} X (X^{T} X + λ I)^{- 1}$

44 / 47

Bias-variance tradeoff

How do you think ridge regression fits into the bias-variance tradeoff?

As $λ$ ☝️, bias ☝️, variance 👇
Bias( ${\hat{β}}_{r i d g e}$ ) = $- λ (X^{T} X + λ I)^{- 1} β$
What would this be if $λ$ was 0?
Var( ${\hat{β}}_{r i d g e}$ ) = $σ^{2} (X^{T} X + λ I)^{- 1} X^{T} X (X^{T} X + λ I)^{- 1}$
Is this bigger or smaller than $σ (X^{T} X)^{- 1}$ ? What is this when $λ = 0$ ? As $λ \to \infty$ does this go up or down?

44 / 47

Dr. Lucy D'Agostino McGowan

Ridge RegressionIMPORTANT: When doing ridge regression, it is important to standardize your variables (divide by the standard deviation)
45 / 47

Ridge Regression

IMPORTANT: When doing ridge regression, it is important to standardize your variables (divide by the standard deviation)

Why?

45 / 47

<!--

-->

46 / 47

Dr. Lucy D'Agostino McGowan

47 / 47

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Ridge Regression

Dr. D’Agostino McGowan

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Matrix fact

Matrix fact

Try it!

Matrix fact

Try it!

Matrix fact

Matrix fact

Matrix fact

Matrix fact

Linear Regression Review

Matrix fact

Matrix fact

Matrix fact

Try it!

Linear Regression Review

Matrix fact

Matrix fact

Matrix fact

Try it!

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Linear Regression Review

Ridge

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Determinants

Determinants

Determinants

Determinants

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

Estimating ^ββ^

What's the problem?

What's the problem?

What's the problem?

What's the problem?

What's the problem?

What can we do about this?

Ridge Regression

Ridge Regression

Ridge Regression

Ridge Regression

Ridge Regression

Ridge Regression

Try it!

Ridge Regression

`Matrix fact`

`Matrix fact`

`Try it!`

`Matrix fact`

`Try it!`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Try it!`

`Matrix fact`

`Matrix fact`

`Matrix fact`

`Try it!`

`Ridge`

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

`Determinants`

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

Estimating $\hat{β}$

`Try it!`

Choosing $λ$