What would be an example of a regression problem?
What would be an example of a classification problem?
Above are mpg
vs horsepower
, weight
, and acceleration
, with a blue linear-regression line fit separately to each. Can we predict mpg
using these three?
Above are mpg
vs horsepower
, weight
, and acceleration
, with a blue linear-regression line fit separately to each. Can we predict mpg
using these three?
Maybe we can do better using a model:
mpg≈f(horsepower,weight,acceleration)
mpg
is the response variable, the outcome variable, we refer to this as Yhorsepower
is a feature, input, predictor, we refer to this as X1weight
is X2acceleration
is X3mpg
is the response variable, the outcome variable, we refer to this as Yhorsepower
is a feature, input, predictor, we refer to this as X1weight
is X2acceleration
is X3 Our *input vector isX=⎡⎢⎣X1X2X3⎤⎥⎦
mpg
is the response variable, the outcome variable, we refer to this as Yhorsepower
is a feature, input, predictor, we refer to this as X1weight
is X2acceleration
is X3 Our *input vector isX=⎡⎢⎣X1X2X3⎤⎥⎦
Y=f(X)+ϵ
How do we choose f(X)? What is a good value for
f(X) at any selected value of X, say X=100? There can be many Y values at X=100.
How do we choose f(X)? What is a good value for
f(X) at any selected value of X, say X=100? There can be many Y values at X=100.
A good value is
f(100)=E(Y|X=100)
How do we choose f(X)? What is a good value for
f(X) at any selected value of X, say X=100? There can be many Y values at X=100.
A good value is
f(100)=E(Y|X=100)
E(Y|X=100) means expected value (average) of Y given X=100
How do we choose f(X)? What is a good value for
f(X) at any selected value of X, say X=100? There can be many Y values at X=100.
A good value is
f(100)=E(Y|X=100)
E(Y|X=100) means expected value (average) of Y given X=100
This ideal f(x)=E(Y|X=x) is called the regression function
f(x)=f(x1,x2,x3)=E[Y|X1=x1,X2=x2,X3=x3]
f(x)=f(x1,x2,x3)=E[Y|X1=x1,X2=x2,X3=x3]
f(x)=E(Y|X=x) is the function that minimizes E[(Y−g(X))2|X=x] over all functions g at all points X=x
f(x)=f(x1,x2,x3)=E[Y|X1=x1,X2=x2,X3=x3]
f(x)=E(Y|X=x) is the function that minimizes E[(Y−g(X))2|X=x] over all functions g at all points X=x
Using these points, how would I calculate the regression function?
Using these points, how would I calculate the regression function?
This point has a Y value of 32.9. What is ϵ?
This point has a Y value of 32.9. What is ϵ?
For any estimate, ^f(x), of f(x), we have
E[(Y−^f(x))2|X=x]=[f(x)−^f(x)]2reducible error+Var(ϵ)irreducible error
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
^f(x)=EThe expectation[Yof Y|givenX∈N(x)X is in the neighborhood of x]
^f(x)=EThe expectation[Yof Y|givenX∈N(x)X is in the neighborhood of x]
If you need a notation pause at any point during this class, please let me know!
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
💡 We can relax the definition and let
^f(x)=E[Y|X∈N(x)]
What do I mean by p? What do I mean by n?
A common parametric model is a linear model
f(X)=β0+β1X1+β2X2+⋯+βpXp
A common parametric model is a linear model
f(X)=β0+β1X1+β2X2+⋯+βpXp
A common parametric model is a linear model
f(X)=β0+β1X1+β2X2+⋯+βpXp
A common parametric model is a linear model
f(X)=β0+β1X1+β2X2+⋯+βpXp
income
from the model:income=f(education, senority)+ϵ
Linear regression model fit to the simulated data
^fL(education, senority)=^β0+^β1education+^β2senority
And even MORE flexible 😱 model ^f(education, seniority)
train
dataMSEtrain=Avei∈train[yi−^f(xi)]2
train
dataMSEtrain=Avei∈train[yi−^f(xi)]2
What can go wrong here?
train
dataMSEtrain=Avei∈train[yi−^f(xi)]2
What can go wrong here?
I have some train
data, plotted above. What ^f(x) would minimize the MSEtrain?
MSEtrain=Avei∈train[yi−^f(xi)]2
I have some train
data, plotted above. What ^f(x) would minimize the MSEtrain?
MSEtrain=Avei∈train[yi−^f(xi)]2
What is wrong with this?
What is wrong with this?
It's overfit!
If we get a new sample, that overfit model is probably going to be terrible!
train
data, we can compute it using fresh test
data test={xi,yi}M1MSEtest=Avei∈test[yi−^f(xi)]2
Black curve is the "truth" on the left. Red curve on right is MSEtest, grey curve is MSEtrain. Orange, blue and green curves/squares correspond to fis of different flexibility.
Here the truth is smoother, so the smoother fit and linear model do really well
Here the truth is wiggly and the noise is low, so the more flexible fits do the best
E(y0−^f(x0))2=Var(^f(x0))+[Bias(^f(x0))]2+Var(ϵ)
E(y0−^f(x0))2=Var(^f(x0))+[Bias(^f(x0))]2+Var(ϵ)
The expectation averages over the variability of y0 as well as the variability of the training data. Bias(^f(x0))=E[^f(x0)]−f(x0)
E(y0−^f(x0))2=Var(^f(x0))+[Bias(^f(x0))]2+Var(ϵ)
The expectation averages over the variability of y0 as well as the variability of the training data. Bias(^f(x0))=E[^f(x0)]−f(x0)
What is the goal?
What is the goal?
Suppose there are K elements in C, numbered 1,2,…,K
pk(x)=P(Y=k|X=x),k=1,2,…,K These are conditional class probabilities at x
Suppose there are K elements in C, numbered 1,2,…,K
pk(x)=P(Y=k|X=x),k=1,2,…,K These are conditional class probabilities at x
How do you think we could calculate this?
Suppose there are K elements in C, numbered 1,2,…,K
pk(x)=P(Y=k|X=x),k=1,2,…,K These are conditional class probabilities at x
How do you think we could calculate this?
Suppose there are K elements in C, numbered 1,2,…,K
pk(x)=P(Y=k|X=x),k=1,2,…,K These are conditional class probabilities at x
C(x)=j if pj(x)=max{p1(x),p2(x),…,pK(x)}
What if this was our data and there were no points at exactly x=5? Then how could we calculate this?
What if this was our data and there were no points at exactly (x = 5)
? Then how could we calculate this?
What if this was our data and there were no points at exactly (x = 5)
? Then how could we calculate this?
Errtest=Avei∈testI[yi≠^C(xi)]
Errtest=Avei∈testI[yi≠^C(xi)]
Errtest=Avei∈testI[yi≠^C(xi)]
Errtest=Avei∈testI[yi≠^C(xi)]
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |