What are some examples of classification problems?
What are some examples of classification problems?
eye color
∈ {blue, brown, green}
email
∈ {spam, not spam}
What are some examples of classification problems?
eye color
∈ {blue, brown, green}
email
∈ {spam, not spam}
What are some examples of classification problems?
eye color
∈ {blue, brown, green}
email
∈ {spam, not spam}
What are some examples of classification problems?
eye color
∈ {blue, brown, green}
email
∈ {spam, not spam}
We can code Default
as
Y={0if No1if Yes
Can we fit a linear regression of Y on X and classify as Yes
if ˆY>0.5?
We can code Default
as
Y={0if No1if Yes
Can we fit a linear regression of Y on X and classify as Yes
if ˆY>0.5?
We can code Default
as
Y={0if No1if Yes
Can we fit a linear regression of Y on X and classify as Yes
if ˆY>0.5?
What may do a better job?
We can code Default
as
Y={0if No1if Yes
Can we fit a linear regression of Y on X and classify as Yes
if ˆY>0.5?
What may do a better job?
Which does a better job at predicting the probability of default?
What if we have >2 possible outcomes? For example, someone comes to the emergency room and we need to classify them according to their symptoms
Y={1if stroke2if drug overdose3if epileptic seizure
What could go wrong here?
What if we have >2 possible outcomes? For example, someone comes to the emergency room and we need to classify them according to their symptoms
Y={1if stroke2if drug overdose3if epileptic seizure
What could go wrong here?
stroke
and drug overdose
is the same as drug overdose
and epileptic seizure
)What if we have >2 possible outcomes? For example, someone comes to the emergency room and we need to classify them according to their symptoms
Y={1if stroke2if drug overdose3if epileptic seizure
p(X)=eβ0+β1X1+eβ0+β1X
p(X)=eβ0+β1X1+eβ0+β1X
What is this transformation called?
p(X)=eβ0+β1X1+eβ0+β1X
What is this transformation called?
Logistic regression ensures that our estimates for p(X) are between 0 and 1 🎉
Refresher: How did we estimate ˆβ in linear regression?
Refresher: How did we estimate (\hat\beta)
in linear regression?
l(β0,β1)=∏i:yi=1p(xi)∏i:yi=0(1−p(xi))
Refresher: How did we estimate (\hat\beta)
in linear regression?
l(β0,β1)=∏i:yi=1p(xi)∏i:yi=0(1−p(xi))
R
do the heavy lifting hereglm(default ~ balance, data = Default, family = "binomial") %>% tidy()
## # A tibble: 2 x 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) -10.7 0.361 -29.5 3.62e-191## 2 balance 0.00550 0.000220 25.0 1.98e-137
glm()
function in R with the family = "binomial"
argumentWhat is our estimated probability of default for someone with a balance of $1000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
What is our estimated probability of default for someone with a balance of $1000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
ˆp(X)=eˆβ0+ˆβ1X1+eˆβ0+ˆβ1X=e−10.65+0.0055×10001+e−10.65+0.0055×1000=0.006
What is our estimated probability of default for someone with a balance of $2000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
What is our estimated probability of default for someone with a balance of $2000?
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.6513306 | 0.3611574 | -29.49221 | 0 |
balance | 0.0054989 | 0.0002204 | 24.95309 | 0 |
ˆp(X)=eˆβ0+ˆβ1X1+eˆβ0+ˆβ1X=e−10.65+0.0055×20001+e−10.65+0.0055×2000=0.586
Let's refit the model to predict the probability of default given the customer is a student
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -3.5041278 | 0.0707130 | -49.554219 | 0.0000000 |
studentYes | 0.4048871 | 0.1150188 | 3.520181 | 0.0004313 |
P(default = Yes|student = Yes)=e−3.5041+0.4049×11+e−3.5041+0.4049×1=0.0431
Let's refit the model to predict the probability of default given the customer is a student
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -3.5041278 | 0.0707130 | -49.554219 | 0.0000000 |
studentYes | 0.4048871 | 0.1150188 | 3.520181 | 0.0004313 |
P(default = Yes|student = Yes)=e−3.5041+0.4049×11+e−3.5041+0.4049×1=0.0431
How will this change if student = No
?
Let's refit the model to predict the probability of default given the customer is a student
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -3.5041278 | 0.0707130 | -49.554219 | 0.0000000 |
studentYes | 0.4048871 | 0.1150188 | 3.520181 | 0.0004313 |
P(default = Yes|student = Yes)=e−3.5041+0.4049×11+e−3.5041+0.4049×1=0.0431
How will this change if student = No
?
P(default = Yes|student = No)=e−3.5041+0.4049×01+e−3.5041+0.4049×0=0.0292
log(p(X)1−p(X))=β0+β1X1+⋯+βpXp p(X)=eβ0+β1X1+⋯+βpXp1+eβ0+β1X1+⋯+βpXp
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.8690452 | 0.4922555 | -22.080088 | 0.0000000 |
balance | 0.0057365 | 0.0002319 | 24.737563 | 0.0000000 |
income | 0.0000030 | 0.0000082 | 0.369815 | 0.7115203 |
studentYes | -0.6467758 | 0.2362525 | -2.737646 | 0.0061881 |
log(p(X)1−p(X))=β0+β1X1+⋯+βpXp p(X)=eβ0+β1X1+⋯+βpXp1+eβ0+β1X1+⋯+βpXp
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -10.8690452 | 0.4922555 | -22.080088 | 0.0000000 |
balance | 0.0057365 | 0.0002319 | 24.737563 | 0.0000000 |
income | 0.0000030 | 0.0000082 | 0.369815 | 0.7115203 |
studentYes | -0.6467758 | 0.2362525 | -2.737646 | 0.0061881 |
student
negative now when it was positive before?What is going on here?
P(Y=k|X)=eβ0k+β1kX1+⋯+βpkXp∑Kl=1eβ0l+β1lX1+⋯+βplXp
What is Bayes theorem?
What is Bayes theorem?
P(Y=k|X=x)=P(X=x|Y=k)×P(Y=k)P(X=x)
P(Y=k|X=x)=P(X=x|Y=k)×P(Y=k)P(X=x)
P(Y=k|X=x)⏟posterior=P(X=x|Y=k)×P(Y=k)P(X=x)
P(Y=k|X=x)=likelihood⏞P(X=x|Y=k)×P(Y=k)P(X=x)
P(Y=k|X=x)=likelihood⏞P(X=x|Y=k)×prior⏞P(Y=k)P(X=x)
P(Sick|+)=P(+|Sick)P(Sick)P(+)=P(+|Sick)P(Sick)P(+|Sick)P(Sick)+P(+|Healthy)P(Healthy)
P(Sick|+)=P(+|Sick)P(Sick)P(+)=P(+|Sick)P(Sick)P(+|Sick)P(Sick)+P(+|Healthy)P(Healthy)
P(Sick|+)=P(+|Sick)P(Sick)P(+)=P(+|Sick)P(Sick)P(+|Sick)P(Sick)+P(+|Healthy)P(Healthy)
P(Sick|+)=P(+|Sick)P(Sick)P(+)=P(+|Sick)P(Sick)P(+|Sick)P(Sick)+P(+|Healthy)P(Healthy)
P(Sick|+)=P(+|Sick)P(Sick)P(+)=P(+|Sick)P(Sick)P(+|Sick)P(Sick)+P(+|Healthy)P(Healthy)
What is my probability of having the disease given I tested positive?
P(Sick|+)=P(+|Sick)P(Sick)P(+).96=0.99×0.20.99×0.2+0.01×0.8
What is my probability of having the disease given I tested positive?
P(Sick|+)=P(+|Sick)P(Sick)P(+).96=0.99×0.20.99×0.2+0.01×0.8
What is my probability of having the disease given I tested positive?
P(Sick|+)=P(+|Sick)P(Sick)P(+)0.09=0.99×0.0010.99×0.001+0.01×0.999
What is my probability of having the disease given I tested positive?
P(Y|X)=P(X|Y)×P(Y)P(X)
This same equation is used for discriminant analysis with slightly different notation:
P(Y|X)=P(X|Y)×P(Y)P(X)
This same equation is used for discriminant analysis with slightly different notation: P(Y|X)=πkfk(x)∑Kl=1fl(x)
P(Y|X)=P(X|Y)×P(Y)P(X)
This same equation is used for discriminant analysis with slightly different notation: P(Y|X)=πkfk(x)∑Kl=1fl(x)
P(Y|X)=P(X|Y)×P(Y)P(X)
This same equation is used for discriminant analysis with slightly different notation: P(Y|X)=πkfk(x)∑Kl=1fl(x)
The density for the normal distribution is
fk(x)=1√2πσke−12(x−μkσk)2
The density for the normal distribution is
fk(x)=1√2πσke−12(x−μkσk)2
The density for the normal distribution is
fk(x)=1√2πσke−12(x−μkσk)2
The density for the normal distribution is
fk(x)=1√2πσke−12(x−μkσk)2
pk(X)=πk1√2πσke−12(x−μkσk)2∑kl=1πl1√2πσle−12(x−μlσl)2
The density for the normal distribution is
fk(x)=1√2πσke−12(x−μkσk)2
pk(X)=πk1√2πσke−12(x−μkσk)2∑kl=1πl1√2πσle−12(x−μlσl)2
😅 Luckily things cancel!
δk(x)=xμkσ2−μ2k2σ2+log(πk)
δk(x)=xμkσ2−μ2k2σ2+log(πk)
δk(x)=xμkσ2−μ2k2σ2+log(πk)
If K=2, how do you think we would calculate the decision boundary?
δ1(x)=δ2(x)
δ1(x)=δ2(x)
δ1(x)=δ2(x)
xμ1σ2−μ212σ2+log(0.5)=xμ2σ2−μ222σ2+log(0.5)xμ1σ2−xμ2σ2=−μ222σ2+log(0.5)+μ212σ2−log(0.5)x(μ1−μ2)=μ21−μ222x=μ21−μ22(μ1−μ2)2x=(μ1−μ2)(μ1+μ2)(μ1−μ2)2x=μ1+μ22
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |