+ - 0:00:00
Notes for current slide
Notes for next slide

Bagging Decision Trees

Dr. D’Agostino McGowan

1 / 12

Decision trees

Pros

  • simple
  • easy to interpret
2 / 12

Decision trees

Pros

  • simple
  • easy to interpret

Cons

  • not often competitive in terms of predictive accuracy
  • we will discuss how to combine multiple trees to improve accuracy
  • Ensemble methods
2 / 12

Bagging

  • bagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
3 / 12

Bagging

  • bagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
  • It is particularly useful and frequently used in the context of decision trees
3 / 12

Bagging

  • bagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
  • It is particularly useful and frequently used in the context of decision trees
  • Also called bootstrap aggregation
3 / 12

Bagging

  • Mathematically, why does this work? Let's go back to intro to stat!
4 / 12

Bagging

  • Mathematically, why does this work? Let's go back to intro to stat!
  • If you have a set of$n$ independent observations: Z1,,Zn, each with a variance of σ2, what would the variance of the mean, Z¯ be?
4 / 12

Bagging

  • Mathematically, why does this work? Let's go back to intro to stat!
  • If you have a set of$n$ independent observations: Z1,,Zn, each with a variance of σ2, what would the variance of the mean, Z¯ be?
  • The variance of Z¯ is σ2/n
4 / 12

Bagging

  • Mathematically, why does this work? Let's go back to intro to stat!
  • If you have a set of$n$ independent observations: Z1,,Zn, each with a variance of σ2, what would the variance of the mean, Z¯ be?
  • The variance of Z¯ is σ2/n
  • In other words, averaging a set of observations reduces the variance.
4 / 12

Bagging

  • Mathematically, why does this work? Let's go back to intro to stat!
  • If you have a set of$n$ independent observations: Z1,,Zn, each with a variance of σ2, what would the variance of the mean, Z¯ be?
  • The variance of Z¯ is σ2/n
  • In other words, averaging a set of observations reduces the variance.
  • This is generally not practical because we generally do not have multiple training sets
4 / 12

Bagging

  • Averaging a set of observations reduces the variance. This is generally not practical because we generally do not have multiple training sets.

What can we do?

5 / 12

Bagging

  • Averaging a set of observations reduces the variance. This is generally not practical because we generally do not have multiple training sets.

What can we do?

  • Bootstrap! We can take repeated samples from the single training data set.
5 / 12

Bagging process

  • generate B different bootstrapped training sets
6 / 12

Bagging process

  • generate B different bootstrapped training sets
  • Train our method on the bth bootstrapped training set to get f^b(x), the prediction at point x
6 / 12

Bagging process

  • generate B different bootstrapped training sets
  • Train our method on the bth bootstrapped training set to get f^b(x), the prediction at point x
  • Average all predictions to get:

f^bag(x)=1Bb=1Bf^b(x)

6 / 12

Bagging process

  • generate B different bootstrapped training sets
  • Train our method on the bth bootstrapped training set to get f^b(x), the prediction at point x
  • Average all predictions to get:

f^bag(x)=1Bb=1Bf^b(x)

  • This is bagging!
6 / 12

Bagging regression trees

  • generate B different bootstrapped training sets
  • Fit a regression tree on the bth bootstrapped training set to get f^b(x), the prediction at point x
  • Average all predictions to get:

f^bag(x)=1Bb=1Bf^b(x)

7 / 12

Bagging classification trees

  • for each test observation, record the class predicted by the B trees
8 / 12

Bagging classification trees

  • for each test observation, record the class predicted by the B trees
  • Take a majority vote - the overall prediction is the most commonly occuring class among the B predictions
8 / 12

Out-of-bag Error Estimation

  • You can estimate the test error of a bagged model
9 / 12

Out-of-bag Error Estimation

  • You can estimate the test error of a bagged model
  • The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
9 / 12

Out-of-bag Error Estimation

  • You can estimate the test error of a bagged model
  • The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
  • On average, each bagged tree makes use of about 2/3 of the observations (you can prove this if you'd like!, not required for this course though)
9 / 12

Out-of-bag Error Estimation

  • You can estimate the test error of a bagged model
  • The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
  • On average, each bagged tree makes use of about 2/3 of the observations (you can prove this if you'd like!, not required for this course though)
  • The remaining 1/3 of observations not used to fit a given bagged tree are the out-of-bag (OOB) observations
9 / 12

Out-of-bag Error Estimation

  • You can predict the response for the ith observation using each of the trees in which that observation was OOB
10 / 12

Out-of-bag Error Estimation

  • You can predict the response for the ith observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the ith observation?

10 / 12

Out-of-bag Error Estimation

  • You can predict the response for the ith observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the ith observation?

  • This will yield B/3 preditions for the ith observations. We can average this!
10 / 12

Out-of-bag Error Estimation

  • You can predict the response for the ith observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the ith observation?

  • This will yield B/3 preditions for the ith observations. We can average this!

  • This estimate is essentially the LOOCV error for bagging as long as B is large 🎉

10 / 12
05:00

Describing Bagging

See if you can draw a diagram to describe the bagging process to someone who has never heard of this before.

11 / 12
12 / 12

Decision trees

Pros

  • simple
  • easy to interpret
2 / 12
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow