Boosting Decision Trees: The Algorithm

Boosting Decision Trees: The AlgorithmDr. D’Agostino McGowan1 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BoostingLike bagging, boosting is an approach that can be applied to many statistical learning methods
2 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

BoostingLike bagging, boosting is an approach that can be applied to many statistical learning methods
We will discuss how to use boosting for decision trees
2 / 10

Boosting

Refresher: What does bagging do?

3 / 10

Boosting

Refresher: What does bagging do?

Bagging involves:
- resampling from the original training data to make many bootstrapped training data sets
- fitting a separate decision tree to each bootstrapped training data set
- combining all trees to make one predictive model

3 / 10

Boosting

Refresher: What does bagging do?

Bagging involves:
- resampling from the original training data to make many bootstrapped training data sets
- fitting a separate decision tree to each bootstrapped training data set
- combining all trees to make one predictive model
☝️ Note, each tree is built on a bootstrap dataset, independent of the other trees

3 / 10

Boosting

Refresher: What does bagging do?

Bagging involves:
- resampling from the original training data to make many bootstrapped training data sets
- fitting a separate decision tree to each bootstrapped training data set
- combining all trees to make one predictive model
☝️ Note, each tree is built on a bootstrap dataset, independent of the other trees
Boosting is similar, except the trees are grown sequentially, using information from the previously grown trees

3 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Boosting algorithm for regression treesStep 1Set ^f(x)=0f^(x)=0 and ri=yiri=yi for all ii in the training set
4 / 10

Boosting algorithm for regression trees

Step 2 For $b = 1, 2, \dots, B$ repeat:

Fit a tree ${\hat{f}}^{b}$ with $d$ splits ( $d$ + 1 terminal nodes) to the training data ( $X, r$ )
Update $\hat{f}$ by adding in a shrunken version of the new tree:

$\hat{f} (x) \leftarrow \hat{f} (x) + λ {\hat{f}}^{b} (x)$

Update the residuals:

$r_{i} \leftarrow r_{i} - λ {\hat{f}}^{b} (x_{i})$

5 / 10

Boosting algorithm for regression trees

Step 3

Output the boosted model

$\hat{f} (x) = \sum_{b = 1}^{B} λ {\hat{f}}^{b} (x)$

6 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big pictureGiven the current model, we are fitting a decision tree to the residuals
7 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big pictureGiven the current model, we are fitting a decision tree to the residuals
We then add this new decision tree into the fitted function to update the residuals
7 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big pictureGiven the current model, we are fitting a decision tree to the residuals
We then add this new decision tree into the fitted function to update the residuals
Each of these trees can be small (just a few terminal nodes), determined by dd
7 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big pictureGiven the current model, we are fitting a decision tree to the residuals
We then add this new decision tree into the fitted function to update the residuals
Each of these trees can be small (just a few terminal nodes), determined by dd
Instead of fitting a single large decision tree, which could result in overfitting, boosting learns slowly
7 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big PictureBy fitting small trees to the residuals we slowly improve ^ff^ in areas where it does not perform well
8 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Big PictureBy fitting small trees to the residuals we slowly improve ^ff^ in areas where it does not perform well
The shrinkage parameter λλ slows the process down even more allowing more and different shaped trees to try to minimize those residuals
8 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Boosting for classificationBoosting for classification is similar, but a bit more complex
9 / 10

Boosting for classification

Boosting for classification is similar, but a bit more complex
tidymodels will handle this for us, but if you are interested in learning more, you can check out Chapter 10 of Elements of Statistical Learning

9 / 10

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

10 / 10

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Boosting Decision Trees: The Algorithm

Dr. D’Agostino McGowan

Boosting

Boosting

Boosting

Boosting

Boosting

Boosting

Boosting algorithm for regression trees

Step 1

Boosting algorithm for regression trees

Step 2 For b=1,2,…,Bb=1,2,…,B repeat:

Boosting algorithm for regression trees

Step 3

Big picture

Big picture

Big picture

Big picture

Big Picture

Big Picture

Boosting for classification

Boosting for classification

Boosting

Help

Step 2 For $b = 1, 2, \dots, B$ repeat: