Bagging Decision Trees

Bagging Decision TreesDr. D’Agostino McGowan1 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Decision treesPros
simple
easy to interpret

2 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Decision treesPros
simple
easy to interpret

Cons
not often competitive in terms of predictive accuracy
we will discuss how to combine multiple trees to improve accuracy
Ensemble methods

2 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Baggingbagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
3 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Baggingbagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
It is particularly useful and frequently used in the context of decision trees
3 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Baggingbagging is a general-purpose procedure for reducing the variance of a statistical learning method (outside of just trees)
It is particularly useful and frequently used in the context of decision trees
Also called bootstrap aggregation
3 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BaggingMathematically, why does this work? Let's go back to intro to stat!
4 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BaggingMathematically, why does this work? Let's go back to intro to stat!
If you have a set of$n$ independent observations: Z1,…,ZnZ1,…,Zn, each with a variance of σ2σ2, what would the variance of the mean, ¯ZZ¯ be?
4 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BaggingMathematically, why does this work? Let's go back to intro to stat!
If you have a set of$n$ independent observations: Z1,…,ZnZ1,…,Zn, each with a variance of σ2σ2, what would the variance of the mean, ¯ZZ¯ be?
The variance of ¯ZZ¯ is σ2/nσ2/n
4 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BaggingMathematically, why does this work? Let's go back to intro to stat!
If you have a set of$n$ independent observations: Z1,…,ZnZ1,…,Zn, each with a variance of σ2σ2, what would the variance of the mean, ¯ZZ¯ be?
The variance of ¯ZZ¯ is σ2/nσ2/n
In other words, averaging a set of observations reduces the variance.
4 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BaggingMathematically, why does this work? Let's go back to intro to stat!
If you have a set of$n$ independent observations: Z1,…,ZnZ1,…,Zn, each with a variance of σ2σ2, what would the variance of the mean, ¯ZZ¯ be?
The variance of ¯ZZ¯ is σ2/nσ2/n
In other words, averaging a set of observations reduces the variance.
This is generally not practical because we generally do not have multiple training sets
4 / 12

Bagging

Averaging a set of observations reduces the variance. This is generally not practical because we generally do not have multiple training sets.

What can we do?

5 / 12

Bagging

Averaging a set of observations reduces the variance. This is generally not practical because we generally do not have multiple training sets.

What can we do?

Bootstrap! We can take repeated samples from the single training data set.

5 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Bagging processgenerate BB different bootstrapped training sets
6 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Bagging processgenerate BB different bootstrapped training sets
Train our method on the bbth bootstrapped training set to get ^f∗b(x)f^∗b(x), the prediction at point xx
6 / 12

Bagging process

generate $B$ different bootstrapped training sets
Train our method on the $b$ th bootstrapped training set to get ${\hat{f}}^{* b} (x)$ , the prediction at point $x$
Average all predictions to get:

${\hat{f}}_{b a g} (x) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{f}}^{* b} (x)$

6 / 12

Bagging process

generate $B$ different bootstrapped training sets
Train our method on the $b$ th bootstrapped training set to get ${\hat{f}}^{* b} (x)$ , the prediction at point $x$
Average all predictions to get:

${\hat{f}}_{b a g} (x) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{f}}^{* b} (x)$

This is bagging!

6 / 12

Bagging regression trees

generate $B$ different bootstrapped training sets
Fit a regression tree on the $b$ th bootstrapped training set to get ${\hat{f}}^{* b} (x)$ , the prediction at point $x$
Average all predictions to get:

${\hat{f}}_{b a g} (x) = \frac{1}{B} \sum_{b = 1}^{B} {\hat{f}}^{* b} (x)$

7 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Bagging classification treesfor each test observation,  record the class predicted by the BB trees
8 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Bagging classification treesfor each test observation,  record the class predicted by the BB trees
Take a majority vote - the overall prediction is the most commonly occuring class among the BB predictions
8 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Out-of-bag Error EstimationYou can estimate the test error of a bagged model
9 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Out-of-bag Error EstimationYou can estimate the test error of a bagged model
The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
9 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Out-of-bag Error EstimationYou can estimate the test error of a bagged model
The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
On average, each bagged tree makes use of about 2/3 of the observations (you can prove this if you'd like!, not required for this course though)
9 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Out-of-bag Error EstimationYou can estimate the test error of a bagged model
The key to bagging is that trees are repeatedly fit to bootstrapped subsets of the observations
On average, each bagged tree makes use of about 2/3 of the observations (you can prove this if you'd like!, not required for this course though)
The remaining 1/3 of observations not used to fit a given bagged tree are the out-of-bag (OOB) observations
9 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Out-of-bag Error EstimationYou can predict the response for the iith observation using each of the trees in which that observation was OOB
10 / 12

Out-of-bag Error Estimation

You can predict the response for the $i$ th observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the $i$ th observation?

10 / 12

Out-of-bag Error Estimation

You can predict the response for the $i$ th observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the $i$ th observation?

This will yield $B / 3$ preditions for the $i$ th observations. We can average this!

10 / 12

Out-of-bag Error Estimation

You can predict the response for the $i$ th observation using each of the trees in which that observation was OOB

How many predictions do you think this will yield for the $i$ th observation?

This will yield $B / 3$ preditions for the $i$ th observations. We can average this!
This estimate is essentially the LOOCV error for bagging as long as $B$ is large 🎉

10 / 12

05:00

`Describing Bagging`

See if you can draw a diagram to describe the bagging process to someone who has never heard of this before.

11 / 12

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

12 / 12

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Bagging Decision Trees

Dr. D’Agostino McGowan

Decision trees

Pros

Decision trees

Pros

Cons

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging

Bagging process

Bagging process

Bagging process

Bagging process

Bagging regression trees

Bagging classification trees

Bagging classification trees

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Out-of-bag Error Estimation

Describing Bagging

Decision trees

Pros

Help

`Describing Bagging`