LDA
appex-02-lda
ˆπk=nkn^πk=nkn
ˆπk=nkn^πk=nkn ˆμk=1nk∑i:yi=kxi^μk=1nk∑i:yi=kxi
ˆπk=nkn^πk=nkn ˆμk=1nk∑i:yi=kxi^μk=1nk∑i:yi=kxi
ˆσ2=1n−KK∑k=1∑i:yi=k(xi−^μk)2=K∑k=1nk−1n−Kˆσ2k^σ2=1n−KK∑k=1∑i:yi=k(xi−^μk)2=K∑k=1nk−1n−K^σ2k
ˆπk=nkn^πk=nkn ˆμk=1nk∑i:yi=kxi^μk=1nk∑i:yi=kxi
ˆσ2=1n−KK∑k=1∑i:yi=k(xi−^μk)2=K∑k=1nk−1n−Kˆσ2k^σ2=1n−KK∑k=1∑i:yi=k(xi−^μk)2=K∑k=1nk−1n−K^σ2k
ˆσ2k=1nk−1∑i:yi=k(xi−ˆμk)2^σ2k=1nk−1∑i:yi=k(xi−^μk)2
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n))
## # A tibble: 3 x 3## y n pi## <dbl> <int> <dbl>## 1 1 5 0.333## 2 2 5 0.333## 3 3 5 0.333
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n))
## # A tibble: 3 x 3## y n pi## <dbl> <int> <dbl>## 1 1 5 0.333## 2 2 5 0.333## 3 3 5 0.333
group_by()
: do calculations on groupsx | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n))
## # A tibble: 3 x 3## y n pi## <dbl> <int> <dbl>## 1 1 5 0.333## 2 2 5 0.333## 3 3 5 0.333
group_by()
: do calculations on groupssummarise()
: reduce variables to valuesx | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n))
## # A tibble: 3 x 3## y n pi## <dbl> <int> <dbl>## 1 1 5 0.333## 2 2 5 0.333## 3 3 5 0.333
group_by()
: do calculations on groupssummarise()
: reduce variables to valuesmutate()
: add new variablesx | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n))
group_by()
: do calculations on groupssummarise()
: reduce variables to valuesmutate()
: add new variablesHow do we pull πkπk out into their own R object?
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
df %>% group_by(y) %>% summarise(n = n()) %>% mutate(pi = n / sum(n)) %>% pull(pi) -> pi
How do we pull πkπk out into their own R object?
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆπk=nkn^πk=nkn
pi
## [1] 0.3333333 0.3333333 0.3333333
How do we pull πkπk out into their own R object?
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆμk=1nk∑i:yi=kxi^μk=1nk∑i:yi=kxi
df %>% group_by(y) %>% summarise(mu = mean(x))
## # A tibble: 3 x 2## y mu## <dbl> <dbl>## 1 1 -1.46## 2 2 1.5 ## 3 3 3.54
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆμk=1nk∑i:yi=kxi^μk=1nk∑i:yi=kxi
df %>% group_by(y) %>% summarise(mu = mean(x)) %>% pull(mu) -> mu
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆσ2=K∑k=1nk−1n−Kˆσ2k^σ2=K∑k=1nk−1n−K^σ2k
df %>% group_by(y) %>% summarise(var_k = var(x), n = n()) %>% mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>% summarise(sigma_sq = sum(v))
## # A tibble: 1 x 1## sigma_sq## <dbl>## 1 1.47
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆσ2=K∑k=1nk−1n−Kˆσ2k^σ2=K∑k=1nk−1n−K^σ2k
df %>% group_by(y) %>% summarise(var_k = var(x), n = n()) %>% mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>% summarise(sigma_sq = sum(v)) %>% pull(sigma_sq) -> sigma_sq
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
δk(x)=xμkσ2−μ2k2σ2+log(πk)δk(x)=xμkσ2−μ2k2σ2+log(πk)
x <- 2x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -3.8155857 0.1795063 -0.5436021
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
δk(x)=xμkσ2−μ2k2σ2+log(πk)δk(x)=xμkσ2−μ2k2σ2+log(πk)
x <- 2x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -3.8155857 0.1795063 -0.5436021
Which class should we give this point?
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
δk(x)=xμkσ2−μ2k2σ2+log(πk)δk(x)=xμkσ2−μ2k2σ2+log(πk)
x <- 6x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -7.796499 4.269486 9.108750
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
δk(x)=xμkσ2−μ2k2σ2+log(πk)δk(x)=xμkσ2−μ2k2σ2+log(πk)
x <- 6x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -7.796499 4.269486 9.108750
Which class should we give this point?
We can turn ˆδk(x)^δk(x) into estimates for class probabilities
We can turn ˆδk(x)^δk(x) into estimates for class probabilities
ˆP(Y=k|X=x)=eˆδk(x)∑Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)∑Kl=1e^δl(x)
We can turn ˆδk(x)^δk(x) into estimates for class probabilities
ˆP(Y=k|X=x)=eˆδk(x)∑Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)∑Kl=1e^δl(x)
We can turn ˆδk(x)^δk(x) into estimates for class probabilities
ˆP(Y=k|X=x)=eˆδk(x)∑Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)∑Kl=1e^δl(x)
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
ˆP(Y=k|X=x)=eˆδk(x)∑Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)∑Kl=1e^δl(x)
x <- 6d <- x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)exp(d) / sum(exp(d))
## [1] 4.515655e-08 7.850755e-03 9.921492e-01
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
lda()
in the MASS packagelibrary(MASS)model <- lda(y ~ x, data = df)
x | -1.6 | 0.2 | -0.9 | -2.0 | -3.0 | 1.9 | 1.2 | 2.2 | 2.7 | -0.5 | 1.8 | 3.3 | 5.0 | 3.4 | 4.2 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
y | 1 | 1 | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 | 3 |
lda()
in the MASS packagelibrary(MASS) model <- lda(y ~ x, data = df)predict(model, newdata = data.frame(x = 6))
## $class## [1] 3## Levels: 1 2 3## ## $posterior## 1 2 3## 1 4.515655e-08 0.007850755 0.9921492## ## $x## LD1## 1 3.968523
LDA
appex-02-lda
f(x)=1(2π)p/2|Σ|1/2e−12(x−μ)TΣ−1(x−μ)f(x)=1(2π)p/2|Σ|1/2e−12(x−μ)TΣ−1(x−μ)
δk(x)=xTΣ−1μk−12μTkΣ−1μk+logπkδk(x)=xTΣ−1μk−12μTkΣ−1μk+logπk
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What is the misclassification rate?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What is the misclassification rate?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What is the misclassification rate?
Since this is training error what is a possible concern?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What is the misclassification rate?
Since this is training error what is a possible concern?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What would the error rate be if we classified to the prior, No
default?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
What would the error rate be if we classified to the prior, No
default?
True Default (No) | True Default (Yes) | Total | |
---|---|---|---|
Predicted Default (No) | 9644 | 252 | 9895 |
Predicted Default (Yes) | 23 | 81 | 104 |
Total | 9667 | 333 | 10000 |
No
's, we make 23/9667=0.2%23/9667=0.2% errors; of the
true Yes
's, we make 252/333=75.7%252/333=75.7% errors!What is the false positive rate in the Credit Default example?
What is the false positive rate in the Credit Default example?
What is the false positive rate in the Credit Default example?
What is the false negative rate in the Credit Default example?
What is the false positive rate in the Credit Default example?
What is the false negative rate in the Credit Default example?
Yes
class ifˆP(Default|Balance, Student)≥0.5
Yes
class ifˆP(Default|Balance, Student)≥0.5 We can change the two error rates by changing the *threshold from 0.5 to some other number between 0 and 1
ˆP(Default|Balance, Student)≥threshold
Which do you think is better, higher or lower AUC?
library(MASS)model <- lda(default ~ balance + student + income, data = Default)
lda()
function in R from the MASS packagelibrary(MASS)model <- lda(default ~ balance + student + income, data = Default)predictions <- predict(model)
lda()
function in R from the MASS
packagepredict()
functionlibrary(MASS)model <- lda(default ~ balance + student + income, data = Default)predictions <- predict(model)Default %>% mutate(predicted_class = predictions$class)
lda()
function in R from the MASS
packagepredict()
functionmutate()
functionlibrary(MASS)model <- lda(default ~ balance + student + income, data = Default)predictions <- predict(model)Default %>% mutate(predicted_class = predictions$class) %>% summarise(fpr = sum(default == "No" & predicted_class == "Yes") / sum(default == "No"), fnr = sum(default == "Yes" & predicted_class == "No") / sum(default == "Yes"))
## fpr fnr## 1 0.002275784 0.7627628
summarise()
function to add the false positive and false negative rateslibrary(MASS)library(tidymodels)model <- lda(default ~ balance + student + income, data = Default)predictions <- predict(model)Default %>% mutate(predicted_class = predictions$class) %>% conf_mat(default, predicted_class) %>% autoplot(type = "heatmap")
conf_mat()
expects your outcome to be a factor variablelibrary(MASS)library(tidymodels)model <- lda(default ~ balance + student + income, data = Default)predictions <- predict(model)Default %>% mutate(predicted_class = predictions$class, default = as.factor(default)) %>% conf_mat(default, predicted_class) %>% autoplot(type = "heatmap")
LDA
appex-02-lda
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |