Logistic regression, LDA, QDA

Logistic regression, LDA, QDA - Part 2Dr. D’Agostino McGowan1 / 39

`LDA`

Go to the sta-363-s20 GitHub organization and search for appex-02-lda
Clone this repository into RStudio Cloud

2 / 39

$μ_{1} = - 1.5$
$μ_{2} = 1.5$
$π_{1} = π_{2} = 0.5$
$σ^{2} = 1$

3 / 39

$μ_{1} = - 1.5$
$μ_{2} = 1.5$
$π_{1} = π_{2} = 0.5$
$σ^{2} = 1$
typically we don't know the true parameters, we just use our training data to estimate them

3 / 39

Estimating parameters

${\hat{π}}_{k} = \frac{n_{k}}{n}$

4 / 39

Estimating parameters

${\hat{π}}_{k} = \frac{n_{k}}{n}$ ${\hat{μ}}_{k} = \frac{1}{n_{k}} \sum_{i : y_{i} = k} x_{i}$

4 / 39

Estimating parameters

${\hat{π}}_{k} = \frac{n_{k}}{n}$ ${\hat{μ}}_{k} = \frac{1}{n_{k}} \sum_{i : y_{i} = k} x_{i}$

$\begin{aligned} {\hat{σ}}^{2} & = \frac{1}{n - K} \sum_{k = 1}^{K} \sum_{i : y_{i} = k} (x_{i} - \hat{μ_{k}})^{2} \\ = \sum_{k = 1}^{K} \frac{n_{k} - 1}{n - K} {\hat{σ}}_{k}^{2} \end{aligned}$

4 / 39

Estimating parameters

${\hat{π}}_{k} = \frac{n_{k}}{n}$ ${\hat{μ}}_{k} = \frac{1}{n_{k}} \sum_{i : y_{i} = k} x_{i}$

$\begin{aligned} {\hat{σ}}^{2} & = \frac{1}{n - K} \sum_{k = 1}^{K} \sum_{i : y_{i} = k} (x_{i} - \hat{μ_{k}})^{2} \\ = \sum_{k = 1}^{K} \frac{n_{k} - 1}{n - K} {\hat{σ}}_{k}^{2} \end{aligned}$

${\hat{σ}}_{k}^{2} = \frac{1}{n_{k} - 1} \sum_{i : y_{i} = k} (x_{i} - {\hat{μ}}_{k})^{2}$

4 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>%
  mutate(pi = n / sum(n))

## # A tibble: 3 x 3
##       y     n    pi
##   <dbl> <int> <dbl>
## 1     1     5 0.333
## 2     2     5 0.333
## 3     3     5 0.333

5 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>%
  mutate(pi = n / sum(n))

## # A tibble: 3 x 3
##       y     n    pi
##   <dbl> <int> <dbl>
## 1     1     5 0.333
## 2     2     5 0.333
## 3     3     5 0.333

group_by(): do calculations on groups

6 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>%
  mutate(pi = n / sum(n))

## # A tibble: 3 x 3
##       y     n    pi
##   <dbl> <int> <dbl>
## 1     1     5 0.333
## 2     2     5 0.333
## 3     3     5 0.333

group_by(): do calculations on groups
summarise(): reduce variables to values

7 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>% 
  mutate(pi = n / sum(n))

## # A tibble: 3 x 3
##       y     n    pi
##   <dbl> <int> <dbl>
## 1     1     5 0.333
## 2     2     5 0.333
## 3     3     5 0.333

group_by(): do calculations on groups
summarise(): reduce variables to values
mutate(): add new variables

8 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>% 
  mutate(pi = n / sum(n))

group_by(): do calculations on groups
summarise(): reduce variables to values
mutate(): add new variables

How do we pull $π_{k}$ out into their own R object?

9 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

df %>%
  group_by(y) %>%
  summarise(n = n()) %>% 
  mutate(pi = n / sum(n)) %>%
  pull(pi) -> pi

How do we pull $π_{k}$ out into their own R object?

10 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{π}}_{k} = \frac{n_{k}}{n}$

pi

## [1] 0.3333333 0.3333333 0.3333333

How do we pull $π_{k}$ out into their own R object?

11 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{μ}}_{k} = \frac{1}{n_{k}} \sum_{i : y_{i} = k} x_{i}$

df %>%
  group_by(y) %>%
  summarise(mu = mean(x))

## # A tibble: 3 x 2
##       y    mu
##   <dbl> <dbl>
## 1     1 -1.46
## 2     2  1.5 
## 3     3  3.54

12 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

${\hat{μ}}_{k} = \frac{1}{n_{k}} \sum_{i : y_{i} = k} x_{i}$

df %>%
  group_by(y) %>%
  summarise(mu = mean(x)) %>%
  pull(mu) -> mu

13 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$\begin{aligned} {\hat{σ}}^{2} = \sum_{k = 1}^{K} \frac{n_{k} - 1}{n - K} {\hat{σ}}_{k}^{2} \end{aligned}$

df %>%
  group_by(y) %>%
  summarise(var_k = var(x),
            n = n()) %>%
  mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>%
  summarise(sigma_sq = sum(v))

## # A tibble: 1 x 1
##   sigma_sq
##      <dbl>
## 1     1.47

14 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$\begin{aligned} {\hat{σ}}^{2} = \sum_{k = 1}^{K} \frac{n_{k} - 1}{n - K} {\hat{σ}}_{k}^{2} \end{aligned}$

df %>%
  group_by(y) %>%
  summarise(var_k = var(x),
            n = n()) %>%
  mutate(v = ((n - 1) / (sum(n) - 3)) * var_k) %>%
  summarise(sigma_sq = sum(v)) %>%
  pull(sigma_sq) -> sigma_sq

15 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$δ_{k} (x) = x \frac{μ_{k}}{σ^{2}} - \frac{μ_{k}^{2}}{2 σ^{2}} + \log (π_{k})$

Let's predict the class for $x = 2$

x <- 2
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)

## [1] -3.8155857  0.1795063 -0.5436021

16 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$δ_{k} (x) = x \frac{μ_{k}}{σ^{2}} - \frac{μ_{k}^{2}}{2 σ^{2}} + \log (π_{k})$

Let's predict the class for $x = 2$

x <- 2
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)

## [1] -3.8155857  0.1795063 -0.5436021

Which class should we give this point?

16 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$δ_{k} (x) = x \frac{μ_{k}}{σ^{2}} - \frac{μ_{k}^{2}}{2 σ^{2}} + \log (π_{k})$

Let's predict the class for $x = 6$

x <- 6
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)

## [1] -7.796499  4.269486  9.108750

17 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$δ_{k} (x) = x \frac{μ_{k}}{σ^{2}} - \frac{μ_{k}^{2}}{2 σ^{2}} + \log (π_{k})$

Let's predict the class for $x = 6$

x <- 6
x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)

## [1] -7.796499  4.269486  9.108750

Which class should we give this point?

17 / 39

From the discriminant score to probabilities

We can turn ${\hat{δ}}_{k} (x)$ into estimates for class probabilities

18 / 39

From the discriminant score to probabilities

We can turn ${\hat{δ}}_{k} (x)$ into estimates for class probabilities

$\hat{P} (Y = k | X = x) = \frac{e^{{\hat{δ}}_{k} (x)}}{\sum_{l = 1}^{K} e^{{\hat{δ}}_{l} (x)}}$

18 / 39

From the discriminant score to probabilities

We can turn ${\hat{δ}}_{k} (x)$ into estimates for class probabilities

$\hat{P} (Y = k | X = x) = \frac{e^{{\hat{δ}}_{k} (x)}}{\sum_{l = 1}^{K} e^{{\hat{δ}}_{l} (x)}}$

Classifying the largest ${\hat{δ}}_{k} (x)$ is the same as classifying to the class with the largest $\hat{P} (Y = k | X = x)$

18 / 39

From the discriminant score to probabilities

We can turn ${\hat{δ}}_{k} (x)$ into estimates for class probabilities

$\hat{P} (Y = k | X = x) = \frac{e^{{\hat{δ}}_{k} (x)}}{\sum_{l = 1}^{K} e^{{\hat{δ}}_{l} (x)}}$

Classifying the largest ${\hat{δ}}_{k} (x)$ is the same as classifying to the class with the largest $\hat{P} (Y = k | X = x)$
For $K = 2$ :
- classify to 2 if $\hat{P} (Y = 2 | X = x) \geq 0.5$
- classify to 1 otherwise

18 / 39

Estimating parameters (in R!)

x	-1.6	0.2	-0.9	-2.0	-3.0	1.9	1.2	2.2	2.7	-0.5	1.8	3.3	5.0	3.4	4.2
y	1	1	1	1	1	2	2	2	2	2	3	3	3	3	3

$\hat{P} (Y = k | X = x) = \frac{e^{{\hat{δ}}_{k} (x)}}{\sum_{l = 1}^{K} e^{{\hat{δ}}_{l} (x)}}$

Let's get the posterior probability of each class for $x = 6$

x <- 6
d <- x * (mu / sigma_sq) - mu^2 / (2 * sigma_sq) + log(pi)
exp(d) / sum(exp(d))

## [1] 4.515655e-08 7.850755e-03 9.921492e-01

19 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Estimating parameters (in R!)

x
-1.6
0.2
-0.9
-2.0
-3.0
1.9
1.2
2.2
2.7
-0.5
1.8
3.3
5.0
3.4
4.2


y
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3

There is a function to do this in R called lda() in the MASS package
library(MASS)
model <- lda(y ~ x, data = df)

20 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Estimating parameters (in R!)

x
-1.6
0.2
-0.9
-2.0
-3.0
1.9
1.2
2.2
2.7
-0.5
1.8
3.3
5.0
3.4
4.2


y
1
1
1
1
1
2
2
2
2
2
3
3
3
3
3

There is a function to do this in R called lda() in the MASS package
library(MASS) 
model <- lda(y ~ x, data = df)
predict(model, newdata = data.frame(x = 6))

## $class
## [1] 3
## Levels: 1 2 3
## 
## $posterior
##              1           2         3
## 1 4.515655e-08 0.007850755 0.9921492
## 
## $x
##        LD1
## 1 3.968523
21 / 39

`LDA`

Go to the sta-363-s20 GitHub organization and search for appex-02-lda
Clone this repository into RStudio Cloud
Complete the exercises
Knit, Commit, Push

22 / 39

Linear discriminant analysis $p > 1$

When $p > 1$ the density takes on the multivariate normal density

$f (x) = \frac{1}{(2 π)^{p / 2} | Σ |^{1 / 2}} e^{- \frac{1}{2} (x - μ)^{T} Σ^{- 1} (x - μ)}$

23 / 39

Linear discriminant analysis $p > 1$

The discriminant function is now

$δ_{k} (x) = x^{T} Σ^{- 1} μ_{k} - \frac{1}{2} μ_{k}^{T} Σ^{- 1} μ_{k} + \log π_{k}$

This is still a linear function!

24 / 39

Example $p = 2$ , $K = 3$

Here $π_{1} = π_{2} = π_{3} = 1 / 3$
The dashed lines the Bayes decision boundaries
- If they were known, they would yield the fewest misclassification errors, among all possible classifiers.

25 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

What is the misclassification rate?

26 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

What is the misclassification rate?

$\frac{23 + 252}{10000}$ errors - $2.75 %$ misclassification

26 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

What is the misclassification rate?

$\frac{23 + 252}{10000}$ errors - $2.75 %$ misclassification
Since this is training error what is a possible concern?

26 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

What is the misclassification rate?

$\frac{23 + 252}{10000}$ errors - $2.75 %$ misclassification
Since this is training error what is a possible concern?
This could be overfit

26 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

LDA on Credit Data


True Default (No)
True Default (Yes)
Total


Predicted Default (No)
9644
252
9895

Predicted Default (Yes)
23
81
104

Total
9667
333
10000

23+2521000023+25210000 errors - 2.75%2.75% misclassification
This could be overfit
Since we have a large n and small p ( n=10,000n=10,000, p=4p=4 ) we aren't too worried about overfitting
27 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

LDA on Credit Data


True Default (No)
True Default (Yes)
Total


Predicted Default (No)
9644
252
9895

Predicted Default (Yes)
23
81
104

Total
9667
333
10000

23+2521000023+25210000 errors - 2.75%2.75% misclassification
28 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

$\frac{23 + 252}{10000}$ errors - $2.75 %$ misclassification
What would the error rate be if we classified to the prior, No default?

28 / 39

LDA on Credit Data

	True Default (No)	True Default (Yes)	Total
Predicted Default (No)	9644	252	9895
Predicted Default (Yes)	23	81	104
Total	9667	333	10000

$\frac{23 + 252}{10000}$ errors - $2.75 %$ misclassification
What would the error rate be if we classified to the prior, No default?
$333 / 10000$ - $3.33 %$

28 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

LDA on Credit Data


True Default (No)
True Default (Yes)
Total


Predicted Default (No)
9644
252
9895

Predicted Default (Yes)
23
81
104

Total
9667
333
10000

23+2521000023+25210000 errors - 2.75%2.75% misclassification
Since we have a large n and small p ( n=10,000n=10,000, p=4p=4 ) we aren't too worried about overfitting
Of the true No's, we make 23/9667=0.2%23/9667=0.2% errors; of the
true Yes's, we make 252/333=75.7%252/333=75.7% errors!
29 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Types of errorsFalse positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative
30 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

30 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

0.2%

30 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

* 0.2%

What is the false negative rate in the Credit Default example?

30 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative

What is the false positive rate in the Credit Default example?

0.2%

What is the false negative rate in the Credit Default example?

75.7%

30 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative
The Credit Default table was created by predicting the Yes class if

$\hat{P} (Default | Balance, Student) \geq 0.5$

31 / 39

Types of errors

False positive rate: The fraction of truly negative that are classified as positive
False negative rate: The fraction of truly positive that are classified as negative
The Credit Default table was created by predicting the Yes class if

$\hat{P} (Default | Balance, Student) \geq 0.5$ We can change the two error rates by changing the *threshold from 0.5 to some other number between 0 and 1

$\hat{P} (Default | Balance, Student) \geq t h r e s h o l d$

31 / 39

Varying the threshold

To reduce the false negative rate we may want the threshold to be 0.1 or less

32 / 39

ROC

A receiver operating characteristic (ROC) curve looks at both simultaneously
The area under the ROC curve (AUC) is sometimes a metric for performance

33 / 39

ROC

A receiver operating characteristic (ROC) curve looks at both simultaneously
The area under the ROC curve (AUC) is sometimes a metric for performance

Which do you think is better, higher or lower AUC?

33 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in Rlibrary(MASS)
model <- lda(default ~ balance + student + income, data = Default)

Use the lda() function in R from the MASS package
34 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in Rlibrary(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)

Use the lda() function in R from the MASS package
Get the predicted classes along with posterior probabilities using the predict() function
35 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in Rlibrary(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
  mutate(predicted_class = predictions$class)

Use the lda() function in R from the MASS package
Get the predicted classes along with posterior probabilities using the predict() function
Add the predicted class using the mutate() function
36 / 39

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in Rlibrary(MASS)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
  mutate(predicted_class = predictions$class) %>%
  summarise(fpr = 
              sum(default == "No" & predicted_class == "Yes") /
              sum(default == "No"),
            fnr = 
              sum(default == "Yes" & predicted_class == "No") /
              sum(default == "Yes"))

##           fpr       fnr
## 1 0.002275784 0.7627628
Use the summarise() function to add the false positive and false negative rates
37 / 39

Let's see it in R

library(MASS)
library(tidymodels)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
  mutate(predicted_class = predictions$class) %>%
  conf_mat(default, predicted_class) %>%
  autoplot(type = "heatmap")

38 / 39

Let's see it in R

conf_mat() expects your outcome to be a factor variable

library(MASS)
library(tidymodels)
model <- lda(default ~ balance + student + income, data = Default)
predictions <- predict(model)
Default %>%
  mutate(predicted_class = predictions$class,
         default = as.factor(default)) %>%
  conf_mat(default, predicted_class) %>% 
  autoplot(type = "heatmap")

39 / 39

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Logistic regression, LDA, QDA - Part 2

Dr. D’Agostino McGowan

LDA

Estimating parameters

Estimating parameters

Estimating parameters

Estimating parameters

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

From the discriminant score to probabilities

From the discriminant score to probabilities

From the discriminant score to probabilities

From the discriminant score to probabilities

Estimating parameters (in R!)

Estimating parameters (in R!)

Estimating parameters (in R!)

LDA

Linear discriminant analysis p>1p>1

Linear discriminant analysis p>1p>1

Example p=2p=2, K=3K=3

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

LDA on Credit Data

Types of errors

Types of errors

Types of errors

Types of errors

Types of errors

Types of errors

Types of errors

Varying the threshold

ROC

ROC

Let's see it in R

Let's see it in R

Let's see it in R

Let's see it in R

Let's see it in R

Let's see it in R

LDA

Help

`LDA`

`LDA`

Linear discriminant analysis $p > 1$

Linear discriminant analysis $p > 1$

Example $p = 2$ , $K = 3$

`LDA`