Decision trees - Regression tree prediction and pruning

Decision trees - Regression tree prediction and pruningDr. D’Agostino McGowan1 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Decision tree predictionsPredict the response for a test observation using the mean of the training observations in the region that the test observation belongs to
2 / 7

Decision tree predictions

Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to
What could potentially go wrong with what we have described so far?

2 / 7

Decision tree predictions

Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to
What could potentially go wrong with what we have described so far?
The process may produce good predictions on the training set but is likely to overfit!

2 / 7

Pruning a tree

Do you love the tree puns? I DO!

A smaller tree (with fewer splits, that is fewer regions $R_{1}, \dots, R_{j}$ ) may lead to lower variance and better interpretation at the cost of a little bias

3 / 7

Pruning a tree

Do you love the tree puns? I DO!

A smaller tree (with fewer splits, that is fewer regions $R_{1}, \dots, R_{j}$ ) may lead to lower variance and better interpretation at the cost of a little bias
A good strategy is to grow a very large tree, $T_{0}$ , and then prune it back to obtain a subtree

3 / 7

Pruning a tree

Do you love the tree puns? I DO!

A smaller tree (with fewer splits, that is fewer regions $R_{1}, \dots, R_{j}$ ) may lead to lower variance and better interpretation at the cost of a little bias
A good strategy is to grow a very large tree, $T_{0}$ , and then prune it back to obtain a subtree
For this, we use cost complexity pruning (also known as weakest link 🔗 pruning)

3 / 7

Pruning a tree

Do you love the tree puns? I DO!

A smaller tree (with fewer splits, that is fewer regions $R_{1}, \dots, R_{j}$ ) may lead to lower variance and better interpretation at the cost of a little bias
A good strategy is to grow a very large tree, $T_{0}$ , and then prune it back to obtain a subtree
For this, we use cost complexity pruning (also known as weakest link 🔗 pruning)
Consider a sequence of trees indexed by a nonnegative tuning parameter, $α$ . For each $α$ there is a subtree $T \subset T_{0}$ such that

$\sum_{m = 1}^{| T |} \sum_{i : x_{i} \in R_{m}} (y_{i} - {\hat{y}}_{R_{m}})^{2} + α | T |$

is as small as possible.

3 / 7

Pruning

$\sum_{m = 1}^{| T |} \sum_{i : x_{i} \in R_{m}} (y_{i} - {\hat{y}}_{R_{m}})^{2} + α | T |$

$| T |$ indicates the number of terminal nodes of the tree $T$

4 / 7

Pruning

$\sum_{m = 1}^{| T |} \sum_{i : x_{i} \in R_{m}} (y_{i} - {\hat{y}}_{R_{m}})^{2} + α | T |$

$| T |$ indicates the number of terminal nodes of the tree $T$
$R_{m}$ is the box (the subset of the predictor space) corresponding to the $m$ th terminal node

4 / 7

Pruning

$\sum_{m = 1}^{| T |} \sum_{i : x_{i} \in R_{m}} (y_{i} - {\hat{y}}_{R_{m}})^{2} + α | T |$

$| T |$ indicates the number of terminal nodes of the tree $T$
$R_{m}$ is the box (the subset of the predictor space) corresponding to the $m$ th terminal node
${\hat{y}}_{R_{m}}$ is the mean of the training observations in $R_{m}$

4 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Choosing the best subtreeThe tuning parameter, αα, controls the trade-off between the subtree's complexity and its fit to the training data
5 / 7

Choosing the best subtree

The tuning parameter, $α$ , controls the trade-off between the subtree's complexity and its fit to the training data
How do you think you could select $α$ ?

5 / 7

Choosing the best subtree

The tuning parameter, $α$ , controls the trade-off between the subtree's complexity and its fit to the training data
How do you think you could select $α$ ?
You can select an optimal value, $\hat{α}$ using cross-validation!

5 / 7

Choosing the best subtree

The tuning parameter, $α$ , controls the trade-off between the subtree's complexity and its fit to the training data
How do you think you could select $α$ ?
You can select an optimal value, $\hat{α}$ using cross-validation!
Then return to the full dataset and obtain the subtree using $\hat{α}$

5 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Summary regression tree algorithmUse recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
6 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Summary regression tree algorithmUse recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of αα
6 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Summary regression tree algorithmUse recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of αα
Use K-fold cross-validation to choose αα. Pick αα to minimize the average error
6 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

Summary regression tree algorithmUse recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of αα
Use K-fold cross-validation to choose αα. Pick αα to minimize the average error
Return the subtree that corresponds to the chosen αα 
6 / 7

Dr. Lucy D'Agostino McGowan adapted from slides by Hastie & Tibshirani

7 / 7

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help

Decision trees - Regression tree prediction and pruning

Dr. D’Agostino McGowan

Decision tree predictions

Decision tree predictions

Decision tree predictions

Pruning a tree

Pruning a tree

Pruning a tree

Pruning a tree

Pruning

Pruning

Pruning

Choosing the best subtree

Choosing the best subtree

Choosing the best subtree

Choosing the best subtree

Summary regression tree algorithm

Summary regression tree algorithm

Summary regression tree algorithm

Summary regression tree algorithm

Decision tree predictions

Help