📖 Canvas
In linear regression, what are we minimizing? How can I write this in matrix form?
In linear regression, what are we minimizing? How can I write this in matrix form?
(y−X^β)T(y−X^β)
In linear regression, what are we minimizing? How can I write this in matrix form?
(y−X^β)T(y−X^β)
What is the solution ( ^β ) to this?
In linear regression, what are we minimizing? How can I write this in matrix form?
(y−X^β)T(y−X^β)
What is the solution ( ^β ) to this?
(XTX)−1XTy
What is X?
What is X?
Matrix fact
C=ABCT=BTAT
Matrix fact
C=ABCT=BTAT
Try it!
RSS=(y−X^β)T(y−X^β)
02:00
Matrix fact
C=ABCT=BTAT
Try it!
RSS=(y−X^β)T(y−X^β)=yTy−^βTXTy−yTX^β+^βTXTX^β
Matrix fact
Matrix fact
Why? What are the dimensions of ^βT? What are the dimensions of X? What are the dimensions of y?
Matrix fact
Why? What are the dimensions of ^βT? What are the dimensions of X? What are the dimensions of y?
Matrix fact
Why? What are the dimensions of ^βT? What are the dimensions of X? What are the dimensions of y?
RSS=(y−X^β)T(y−X^β)=yTy−^βTXTy−yTX^β+^βTXTX^β=yTy−2^βTXTy+^βTXTX^β
To find the ^β that is going to minimize this RSS, what do we do? Why?
RSS=(y−X^β)T(y−X^β)=yTy−^βTXTy−yTX^β+^βTXTX^β=yTy−2^βTXTy+^βTXTX^β
Matrix fact
∂aTb∂b=∂bTa∂b=a
Matrix fact
∂aTb∂b=∂bTa∂b=a
∂bTAb∂b=2Ab=2bTA
Matrix fact
∂aTb∂b=∂bTa∂b=a
∂bTAb∂b=2Ab=2bTA
Try it!
∂RSS∂^β=
02:00
How did we get (XTX)−1XTy?
RSS=yTy−2^βTXTy+^βTXTX^β
∂RSS∂^β=−2XTy+2XTX^β=0
Matrix fact
AA−1=I
Matrix fact
AA−1=I
What is I?
Matrix fact
AA−1=I
What is I?
I=⎡⎢ ⎢ ⎢ ⎢ ⎢⎣10…001…0⋮⋮⋱⋮00…1⎤⎥ ⎥ ⎥ ⎥ ⎥⎦
AI=A
Try it!
−2XTy+2XTX^β=0
02:00
How did we get (XTX)−1XTy?
−2XTy+2XTX^β=02XTX^β=2XTyXTX^β=XTy
How did we get (XTX)−1XTy?
−2XTy+2XTX^β=02XTX^β=2XTyXTX^β=XTy(XTX)−1XTX^β=(XTX)−1XTy
How did we get (XTX)−1XTy?
−2XTy+2XTX^β=02XTX^β=2XTyXTX^β=XTy(XTX)−1XTX^β=(XTX)−1XTy(XTX)−1XTXI^β=(XTX)−1XTy
How did we get (XTX)−1XTy?
−2XTy+2XTX^β=02XTX^β=2XTyXTX^β=XTy(XTX)−1XTX^β=(XTX)−1XTy(XTX)−1XTXI^β=(XTX)−1XTyI^β=(XTX)−1XTy
How did we get (XTX)−1XTy?
−2XTy+2XTX^β=02XTX^β=2XTyXTX^β=XTy(XTX)−1XTX^β=(XTX)−1XTy(XTX)−1XTXI^β=(XTX)−1XTyI^β=(XTX)−1XTy^β=(XTX)−1XTy
Let's try to find an X for which it would be impossible to calculate ^β
Ridge
05:00
^β=(XTX)−1XTy
Under what circumstances is this equation not estimable?
^β=(XTX)−1XTy
Under what circumstances is this equation not estimable?
^β=(XTX)−1XTy
Under what circumstances is this equation not estimable?
^β=(XTX)−1XTy
Under what circumstances is this equation not estimable?
A guaranteed way to check whether a square matrix is not invertible is to check whether the determinant is equal to zero
X=[12311340]
What is n here? What is p?
X=[12311340]
What is n here? What is p?
Is (XTX)−1 going to be invertible?
X=[12311340]
What is n here? What is p?
Is (XTX)−1 going to be invertible?
X <- matrix(c(1, 1, 2, 3, 3, 4, 1, 0), nrow = 2)det(t(X) %*% X)
## [1] 0
X=⎡⎢ ⎢ ⎢⎣1361481510124⎤⎥ ⎥ ⎥⎦
X=⎡⎢ ⎢ ⎢⎣1361481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
X=⎡⎢ ⎢ ⎢⎣1361481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
X <- matrix(c(1, 1, 1, 1, 3, 4, 5, 2, 6, 8, 10, 4), nrow = 4)det(t(X) %*% X)
## [1] 0
cor(X[, 2], X[, 3])
## [1] 1
X=⎡⎢ ⎢ ⎢⎣1361481510124⎤⎥ ⎥ ⎥⎦
What was the problem this time?
X <- matrix(c(1, 1, 1, 1, 3, 4, 5, 2, 6, 8, 10, 4), nrow = 4)det(t(X) %*% X)
## [1] 0
cor(X[, 2], X[, 3])
## [1] 1
What is a sure-fire way to tell whether (XTX)−1 will be invertible?
What is a sure-fire way to tell whether (XTX)−1 will be invertible?
What is a sure-fire way to tell whether (XTX)−1 will be invertible?
|A| means the determinant of matrix A
What is a sure-fire way to tell whether (XTX)−1 will be invertible?
|A| means the determinant of matrix A
A=[abcd] |A|=ad−bc
What is a sure-fire way to tell whether (XTX)−1 will be invertible?
|A| means the determinant of matrix A
A=⎡⎢⎣abcdefghi⎤⎥⎦ |A|=a(ei−fh)−b(di−fg)+c(dh−eg)
It looks funky, but it follows a nice pattern!
A=⎡⎢⎣abcdefghi⎤⎥⎦ |A|=a(ei−fh)−b(di−fg)+c(dh−eg)
It looks funky, but it follows a nice pattern!
A=⎡⎢⎣abcdefghi⎤⎥⎦ |A|=a(ei−fh)−b(di−fg)+c(dh−eg)
It looks funky, but it follows a nice pattern!
A=⎡⎢⎣abcdefghi⎤⎥⎦ |A|=a(ei−fh)−b(di−fg)+c(dh−eg)
|A|=a∣∣∣efhi∣∣∣−b∣∣∣dfgi∣∣∣+c∣∣∣degh∣∣∣
Determinants
det()
function:A=[1245]
B=⎡⎢⎣123369257⎤⎥⎦
01:00
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
X <- matrix(c(1, 1, 1, 1, 3.01, 4, 5, 2, 6, 8, 10, 4), nrow = 4)det(t(X) %*% X)
## [1] 0.0056
cor(X[, 2], X[, 3])
## [1] 0.999993
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
y <- c(1, 2, 3, 2)solve(t(X) %*% X) %*% t(X) %*% y
## [,1]## [1,] 1.285714## [2,] -114.285714## [3,] 57.285714
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
Is (XTX)−1 going to be invertible?
⎡⎢ ⎢⎣^β0^β1^β2⎤⎥ ⎥⎦=⎡⎢⎣1.28−114.2957.29⎤⎥⎦
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
What is the equation for the variance of ^β?
var(^β)=σ2(XTX)−1
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
What is the equation for the variance of ^β?
var(^β)=σ2(XTX)−1
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
What is the equation for the variance of ^β?
var(^β)=σ2(XTX)−1
var(^β)=⎡⎢⎣0.91835−24.48912.132−24.489434081.571−2038.74512.13247−2038.7451018.367⎤⎥⎦
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
var(^β)=⎡⎢⎣0.91835−24.48912.132−24.489434081.571−2038.74512.13247−2038.7451018.367⎤⎥⎦
What is the variance for ^β0?
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
var(^β)=⎡⎢⎣0.91835−24.48912.132−24.489434081.571−2038.74512.13247−2038.7451018.367⎤⎥⎦
What is the variance for ^β0?
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
var(^β)=⎡⎢⎣0.91835−24.48912.132−24.489434081.571−2038.74512.13247−2038.7451018.367⎤⎥⎦
What is the variance for ^β1?
X=⎡⎢ ⎢ ⎢⎣13.0161481510124⎤⎥ ⎥ ⎥⎦
var(^β)=⎡⎢⎣0.91835−24.48912.132−24.489434081.571−2038.74512.13247−2038.7451018.367⎤⎥⎦
What is the variance for ^β1? 😱
Why?
RSS+λp∑j=1β2jshrinkage penalty
RSS+λp∑j=1β2jshrinkage penalty
What happens when λ=0? What happens as λ→∞?
Let's solve for the ^β coefficients using Ridge Regression. What are we minimizing?
Let's solve for the ^β coefficients using Ridge Regression. What are we minimizing?
(y−Xβ)T(y−Xβ)+λβTβ
Try it!
(y−Xβ)T(y−Xβ)+λβTβ
02:00
^βridge=(XTX+λI)−1XTy
^βridge=(XTX+λI)−1XTy
How do you think ridge regression fits into the bias-variance tradeoff?
How do you think ridge regression fits into the bias-variance tradeoff?
How do you think ridge regression fits into the bias-variance tradeoff?
How do you think ridge regression fits into the bias-variance tradeoff?
What would this be if λ was 0?
How do you think ridge regression fits into the bias-variance tradeoff?
What would this be if λ was 0?
How do you think ridge regression fits into the bias-variance tradeoff?
What would this be if λ was 0?
Is this bigger or smaller than σ(XTX)−1? What is this when λ=0? As λ→∞ does this go up or down?
Why?
<!--
--> <!-- -->📖 Canvas
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |