Logistic regression, LDA, QDA

Logistic regression, LDA, QDA - Part 3Dr. D’Agostino McGowan1 / 19

Other forms of discriminant analysis

$P (Y | X) = \frac{π_{k} f_{k} (x)}{\sum_{l = 1}^{K} π_{l} f_{l} (x)}$

2 / 19

Other forms of discriminant analysis

$P (Y | X) = \frac{π_{k} f_{k} (x)}{\sum_{l = 1}^{K} π_{l} f_{l} (x)}$

When $f_{k} (x)$ are normal densities with the same covariance matrix $Σ$ in each class, this is linear discriminant analysis

2 / 19

Other forms of discriminant analysis

$P (Y | X) = \frac{π_{k} f_{k} (x)}{\sum_{l = 1}^{K} π_{l} f_{l} (x)}$

When $f_{k} (x)$ are normal densities with the same covariance matrix $Σ$ in each class, this is linear discriminant analysis
When $f_{k} (x)$ are normal densities with different covariance matrices $Σ_{k}$ in each class, this is quadratic discriminant analysis

2 / 19

Other forms of discriminant analysis

$P (Y | X) = \frac{π_{k} f_{k} (x)}{\sum_{l = 1}^{K} π_{l} f_{l} (x)}$

When $f_{k} (x)$ are normal densities with the same covariance matrix $Σ$ in each class, this is linear discriminant analysis
When $f_{k} (x)$ are normal densities with different covariance matrices $Σ_{k}$ in each class, this is quadratic discriminant analysis
Lots of other forms are possible!

2 / 19

Quadratic Discriminant Analysis

$δ_{k} (x) = - \frac{1}{2} (x - μ_{k})^{T} Σ_{k}^{- 1} (x - μ_{k}) + \log π_{k}$

Why do you think this is called quadratic discriminant analysis?

3 / 19

Quadratic Discriminant Analysis

$δ_{k} (x) = - \frac{1}{2} (x - μ_{k})^{T} Σ_{k}^{- 1} (x - μ_{k}) + \log π_{k}$

Why do you think this is called quadratic discriminant analysis?

Because the $Σ_{k}$ are different, the quadratic terms matter

3 / 19

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in Rlibrary(MASS)
model <- qda(default ~ balance + student, data = Default)
predictions <- predict(model)

Use the qda() function in R from the MASS package
4 / 19

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in RLet's use LDA to visualize the data
model <- lda(Species ~ ., data = iris)
predictions <- predict(model)

5 / 19

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Let's see it in RLet's use LDA to visualize the data
model <- lda(Species ~ ., data = iris) 
predictions <- predict(model)
plot_data <- data.frame(outcome = iris$Species,
                        lda = predictions$x)
head(plot_data)

##   outcome  lda.LD1    lda.LD2
## 1  setosa 8.061800  0.3004206
## 2  setosa 7.128688 -0.7866604
## 3  setosa 7.489828 -0.2653845
## 4  setosa 6.813201 -0.6706311
## 5  setosa 8.132309  0.5144625
## 6  setosa 7.701947  1.4617210
6 / 19

Let's see it in R

Let's use LDA to visualize the data

ggplot(data = plot_data,
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) +
  geom_point()

7 / 19

ggplot2 $\in$ tidyverse

ggplot2 is tidyverse's data visualization package
The gg in "ggplot2" stands for Grammar of Graphics
It is inspired by the book Grammar of Graphics by Leland Wilkinson ^†
A grammar of graphics is a tool that enables us to concisely describe the components of a graphic

^† Source: BloggoType

8 / 19

ggplot2

What function creates the plot?

ggplot(data = plot_data, 
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) + 
  geom_point() +
  labs(x = "LD1", y = "LD2")

9 / 19

ggplot2

What data set is being plotted?

ggplot(data = plot_data, 
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) + 
  geom_point() +
  labs(x = "LD1", y = "LD2")

10 / 19

ggplot2

Which variables are on the x- and y-axis?

ggplot(data = plot_data, 
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) + 
  geom_point() +
  labs(x = "LD1", y = "LD2")

11 / 19

ggplot2

What variable in the dataset determines the color?

ggplot(data = plot_data, 
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) + 
  geom_point() +
  labs(x = "LD1", y = "LD2")

12 / 19

ggplot2

What does geom_point() mean?

ggplot(data = plot_data, 
       mapping = aes(x = lda.LD1, y = lda.LD2, color = outcome)) + 
  geom_point() +
  labs(x = "LD1", y = "LD2")

13 / 19

Hello ggplot2!

ggplot() is the main function in ggplot2 and plots are constructed in layers
The structure of the code for plots can often be summarized as

ggplot + 
  geom_xxx

14 / 19

Hello ggplot2!

ggplot() is the main function in ggplot2 and plots are constructed in layers
The structure of the code for plots can often be summarized as

ggplot + 
  geom_xxx

or, more precisely

ggplot(data = [dataset], mapping = aes(x = [x-variable], y = [y-variable])) +
  geom_xxx() +
  other options

14 / 19

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Hello ggplot2!To use ggplot2 functions, first load tidyverse
15 / 19

Hello ggplot2!

To use ggplot2 functions, first load tidyverse

For help with the ggplot2, see ggplot2.tidyverse.org

15 / 19

What is going on here?

16 / 19

What is going on here?
LDA is projecting the samples $X$ onto a hyperplane with $K - 1$ dimensions.

16 / 19

What is going on here?
LDA is projecting the samples $X$ onto a hyperplane with $K - 1$ dimensions.
What is K here?

16 / 19

What is going on here?
LDA is projecting the samples $X$ onto a hyperplane with $K - 1$ dimensions.
Why does this work?
LDA essentially classifies to the closest centroid, and they span a K - 1 dimensional plane.

17 / 19

What is going on here?
LDA is projecting the samples $X$ onto a hyperplane with $K - 1$ dimensions.
Why does this work?
LDA essentially classifies to the closest centroid, and they span a K - 1 dimensional plane.
Even when K > 3, we can find the "best" 2-dimensional plane for vizualizing the discriminant rule by using the first two discriminant variables (LD1 and LD2)

17 / 19