+ - 0:00:00
Notes for current slide
Notes for next slide

Logistic regression, LDA, QDA - Part 2

Dr. D’Agostino McGowan

1 / 39

LDA

2 / 39

  • μ1=1.5μ1=1.5
  • μ2=1.5μ2=1.5
  • π1=π2=0.5π1=π2=0.5
  • σ2=1σ2=1
3 / 39

  • μ1=1.5μ1=1.5
  • μ2=1.5μ2=1.5
  • π1=π2=0.5π1=π2=0.5
  • σ2=1σ2=1
  • typically we don't know the true parameters, we just use our training data to estimate them
3 / 39

Estimating parameters

ˆπk=nkn^πk=nkn

4 / 39

Estimating parameters

ˆπk=nkn^πk=nkn ˆμk=1nki:yi=kxi^μk=1nki:yi=kxi

4 / 39

Estimating parameters

ˆπk=nkn^πk=nkn ˆμk=1nki:yi=kxi^μk=1nki:yi=kxi

ˆσ2=1nKKk=1i:yi=k(xi^μk)2=Kk=1nk1nKˆσ2k^σ2=1nKKk=1i:yi=k(xi^μk)2=Kk=1nk1nK^σ2k

4 / 39

Estimating parameters

ˆπk=nkn^πk=nkn ˆμk=1nki:yi=kxi^μk=1nki:yi=kxi

ˆσ2=1nKKk=1i:yi=k(xi^μk)2=Kk=1nk1nKˆσ2k^σ2=1nKKk=1i:yi=k(xi^μk)2=Kk=1nk1nK^σ2k

ˆσ2k=1nk1i:yi=k(xiˆμk)2^σ2k=1nk1i:yi=k(xi^μk)2

4 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n))
## # A tibble: 3 x 3
## y n pi
## <dbl> <int> <dbl>
## 1 1 5 0.333
## 2 2 5 0.333
## 3 3 5 0.333
5 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n))
## # A tibble: 3 x 3
## y n pi
## <dbl> <int> <dbl>
## 1 1 5 0.333
## 2 2 5 0.333
## 3 3 5 0.333
  • group_by(): do calculations on groups
6 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n))
## # A tibble: 3 x 3
## y n pi
## <dbl> <int> <dbl>
## 1 1 5 0.333
## 2 2 5 0.333
## 3 3 5 0.333
  • group_by(): do calculations on groups
  • summarise(): reduce variables to values
7 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n))
## # A tibble: 3 x 3
## y n pi
## <dbl> <int> <dbl>
## 1 1 5 0.333
## 2 2 5 0.333
## 3 3 5 0.333
  • group_by(): do calculations on groups
  • summarise(): reduce variables to values
  • mutate(): add new variables
8 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n))
  • group_by(): do calculations on groups
  • summarise(): reduce variables to values
  • mutate(): add new variables

How do we pull πkπk out into their own R object?

9 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

df %>%
group_by(y) %>%
summarise(n = n()) %>%
mutate(pi = n / sum(n)) %>%
pull(pi) -> pi

How do we pull πkπk out into their own R object?

10 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆπk=nkn^πk=nkn

pi
## [1] 0.3333333 0.3333333 0.3333333

How do we pull πkπk out into their own R object?

11 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆμk=1nki:yi=kxi^μk=1nki:yi=kxi

df %>%
group_by(y) %>%
summarise(mu = mean(x))
## # A tibble: 3 x 2
## y mu
## <dbl> <dbl>
## 1 1 -1.46
## 2 2 1.5
## 3 3 3.54
12 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆμk=1nki:yi=kxi^μk=1nki:yi=kxi

df %>%
group_by(y) %>%
summarise(mu = mean(x)) %>%
pull(mu) -> mu
13 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆσ2=Kk=1nk1nKˆσ2k^σ2=Kk=1nk1nK^σ2k

df %>%
group_by(y) %>%
summarise(var_k = var(x),
n = n()) %>%
mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>%
summarise(sigma_sq = sum(v))
## # A tibble: 1 x 1
## sigma_sq
## <dbl>
## 1 1.47
14 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆσ2=Kk=1nk1nKˆσ2k^σ2=Kk=1nk1nK^σ2k

df %>%
group_by(y) %>%
summarise(var_k = var(x),
n = n()) %>%
mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>%
summarise(sigma_sq = sum(v)) %>%
pull(sigma_sq) -> sigma_sq
15 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

δk(x)=xμkσ2μ2k2σ2+log(πk)δk(x)=xμkσ2μ2k2σ2+log(πk)

  • Let's predict the class for x=2x=2
x <- 2
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -3.8155857 0.1795063 -0.5436021
16 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

δk(x)=xμkσ2μ2k2σ2+log(πk)δk(x)=xμkσ2μ2k2σ2+log(πk)

  • Let's predict the class for x=2x=2
x <- 2
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -3.8155857 0.1795063 -0.5436021

Which class should we give this point?

16 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

δk(x)=xμkσ2μ2k2σ2+log(πk)δk(x)=xμkσ2μ2k2σ2+log(πk)

  • Let's predict the class for x=6x=6
x <- 6
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -7.796499 4.269486 9.108750
17 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

δk(x)=xμkσ2μ2k2σ2+log(πk)δk(x)=xμkσ2μ2k2σ2+log(πk)

  • Let's predict the class for x=6x=6
x <- 6
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
## [1] -7.796499 4.269486 9.108750

Which class should we give this point?

17 / 39

From the discriminant score to probabilities

We can turn ˆδk(x)^δk(x) into estimates for class probabilities

18 / 39

From the discriminant score to probabilities

We can turn ˆδk(x)^δk(x) into estimates for class probabilities

ˆP(Y=k|X=x)=eˆδk(x)Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)Kl=1e^δl(x)

18 / 39

From the discriminant score to probabilities

We can turn ˆδk(x)^δk(x) into estimates for class probabilities

ˆP(Y=k|X=x)=eˆδk(x)Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)Kl=1e^δl(x)

  • Classifying the largest ˆδk(x)^δk(x) is the same as classifying to the class with the largest ˆP(Y=k|X=x)^P(Y=k|X=x)
18 / 39

From the discriminant score to probabilities

We can turn ˆδk(x)^δk(x) into estimates for class probabilities

ˆP(Y=k|X=x)=eˆδk(x)Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)Kl=1e^δl(x)

  • Classifying the largest ˆδk(x)^δk(x) is the same as classifying to the class with the largest ˆP(Y=k|X=x)^P(Y=k|X=x)
  • For K=2K=2:
    • classify to 2 if ˆP(Y=2|X=x)0.5^P(Y=2|X=x)0.5
    • classify to 1 otherwise
18 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3

ˆP(Y=k|X=x)=eˆδk(x)Kl=1eˆδl(x)^P(Y=k|X=x)=e^δk(x)Kl=1e^δl(x)

  • Let's get the posterior probability of each class for x=6x=6
x <- 6
d <- x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
exp(d) / sum(exp(d))
## [1] 4.515655e-08 7.850755e-03 9.921492e-01
19 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
  • There is a function to do this in R called lda() in the MASS package
library(MASS)
model <- lda(y ~ x, data = df)
20 / 39

Estimating parameters (in R!)

x -1.6 0.2 -0.9 -2.0 -3.0 1.9 1.2 2.2 2.7 -0.5 1.8 3.3 5.0 3.4 4.2
y 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
  • There is a function to do this in R called lda() in the MASS package
library(MASS)
model <- lda(y ~ x, data = df)
predict(model, newdata = data.frame(x = 6))
## $class
## [1] 3
## Levels: 1 2 3
##
## $posterior
## 1 2 3
## 1 4.515655e-08 0.007850755 0.9921492
##
## $x
## LD1
## 1 3.968523
21 / 39

LDA

22 / 39

Linear discriminant analysis p>1p>1

  • When p>1p>1 the density takes on the multivariate normal density

f(x)=1(2π)p/2|Σ|1/2e12(xμ)TΣ1(xμ)f(x)=1(2π)p/2|Σ|1/2e12(xμ)TΣ1(xμ)

23 / 39

Linear discriminant analysis p>1p>1

  • The discriminant function is now

δk(x)=xTΣ1μk12μTkΣ1μk+logπkδk(x)=xTΣ1μk12μTkΣ1μk+logπk

  • This is still a linear function!
24 / 39

Example p=2p=2, K=3K=3

  • Here π1=π2=π3=1/3π1=π2=π3=1/3
  • The dashed lines the Bayes decision boundaries
    • If they were known, they would yield the fewest misclassification errors, among all possible classifiers.
25 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000

What is the misclassification rate?

26 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000

What is the misclassification rate?

  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification
26 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000

What is the misclassification rate?

  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification

    Since this is training error what is a possible concern?

26 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000

What is the misclassification rate?

  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification

    Since this is training error what is a possible concern?

  • This could be overfit
26 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000
  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification
  • This could be overfit
  • Since we have a large n and small p ( n=10,000n=10,000, p=4p=4 ) we aren't too worried about overfitting
27 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000
  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification
28 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000
  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification

    What would the error rate be if we classified to the prior, No default?

28 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000
  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification

    What would the error rate be if we classified to the prior, No default?

  • 333/10000333/10000 - 3.33%3.33%
28 / 39

LDA on Credit Data

True Default (No) True Default (Yes) Total
Predicted Default (No) 9644 252 9895
Predicted Default (Yes) 23 81 104
Total 9667 333 10000
  • 23+2521000023+25210000 errors - 2.75%2.75% misclassification
  • Since we have a large n and small p ( n=10,000n=10,000, p=4p=4 ) we aren't too worried about overfitting
  • Of the true No's, we make 23/9667=0.2%23/9667=0.2% errors; of the true Yes's, we make 252/333=75.7%252/333=75.7% errors!
29 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative
30 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

30 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

  • 0.2%
30 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

* 0.2%

What is the false negative rate in the Credit Default example?

30 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

  • 0.2%

What is the false negative rate in the Credit Default example?

  • 75.7%
30 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative
  • The Credit Default table was created by predicting the Yes class if

ˆP(Default|Balance, Student)0.5

31 / 39

Types of errors

  • False positive rate: The fraction of truly negative that are classified as positive
  • False negative rate: The fraction of truly positive that are classified as negative
  • The Credit Default table was created by predicting the Yes class if

ˆP(Default|Balance, Student)0.5 We can change the two error rates by changing the *threshold from 0.5 to some other number between 0 and 1

ˆP(Default|Balance, Student)threshold

31 / 39

Varying the threshold

  • To reduce the false negative rate we may want the threshold to be 0.1 or less
32 / 39

ROC

  • A receiver operating characteristic (ROC) curve looks at both simultaneously
  • The area under the ROC curve (AUC) is sometimes a metric for performance
33 / 39

ROC

  • A receiver operating characteristic (ROC) curve looks at both simultaneously
  • The area under the ROC curve (AUC) is sometimes a metric for performance

Which do you think is better, higher or lower AUC?

33 / 39

Let's see it in R

library(MASS)
model <- lda(default ~ balance + student + income, data = Default)
  • Use the lda() function in R from the MASS package
34 / 39

Let's see it in R

library(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
  • Use the lda() function in R from the MASS package
  • Get the predicted classes along with posterior probabilities using the predict() function
35 / 39

Let's see it in R

library(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
mutate(predicted_class = predictions$class)
  • Use the lda() function in R from the MASS package
  • Get the predicted classes along with posterior probabilities using the predict() function
  • Add the predicted class using the mutate() function
36 / 39

Let's see it in R

library(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
mutate(predicted_class = predictions$class) %>%
summarise(fpr =
sum(default == "No" & predicted_class == "Yes") /
sum(default == "No"),
fnr =
sum(default == "Yes" & predicted_class == "No") /
sum(default == "Yes"))
## fpr fnr
## 1 0.002275784 0.7627628
  • Use the summarise() function to add the false positive and false negative rates
37 / 39

Let's see it in R

library(MASS)
library(tidymodels)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
mutate(predicted_class = predictions$class) %>%
conf_mat(default, predicted_class) %>%
autoplot(type = "heatmap")

38 / 39

Let's see it in R

  • conf_mat() expects your outcome to be a factor variable
library(MASS)
library(tidymodels)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
mutate(predicted_class = predictions$class,
default = as.factor(default)) %>%
conf_mat(default, predicted_class) %>%
autoplot(type = "heatmap")
39 / 39

LDA

2 / 39
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow