+ - 0:00:00
Notes for current slide
Notes for next slide

Decision trees - Regression tree prediction and pruning

Dr. D’Agostino McGowan

1 / 7

Decision tree predictions

  • Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to
2 / 7

Decision tree predictions

  • Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to

    What could potentially go wrong with what we have described so far?

2 / 7

Decision tree predictions

  • Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to

    What could potentially go wrong with what we have described so far?

  • The process may produce good predictions on the training set but is likely to overfit!
2 / 7

Pruning a tree

Do you love the tree puns? I DO!

  • A smaller tree (with fewer splits, that is fewer regions R1,,Rj ) may lead to lower variance and better interpretation at the cost of a little bias
3 / 7

Pruning a tree

Do you love the tree puns? I DO!

  • A smaller tree (with fewer splits, that is fewer regions R1,,Rj ) may lead to lower variance and better interpretation at the cost of a little bias
  • A good strategy is to grow a very large tree, T0, and then prune it back to obtain a subtree
3 / 7

Pruning a tree

Do you love the tree puns? I DO!

  • A smaller tree (with fewer splits, that is fewer regions R1,,Rj ) may lead to lower variance and better interpretation at the cost of a little bias
  • A good strategy is to grow a very large tree, T0, and then prune it back to obtain a subtree
  • For this, we use cost complexity pruning (also known as weakest link 🔗 pruning)
3 / 7

Pruning a tree

Do you love the tree puns? I DO!

  • A smaller tree (with fewer splits, that is fewer regions R1,,Rj ) may lead to lower variance and better interpretation at the cost of a little bias
  • A good strategy is to grow a very large tree, T0, and then prune it back to obtain a subtree
  • For this, we use cost complexity pruning (also known as weakest link 🔗 pruning)
  • Consider a sequence of trees indexed by a nonnegative tuning parameter, α. For each α there is a subtree TT0 such that

m=1|T|i:xiRm(yiy^Rm)2+α|T|

is as small as possible.

3 / 7

Pruning

m=1|T|i:xiRm(yiy^Rm)2+α|T|

  • |T| indicates the number of terminal nodes of the tree T
4 / 7

Pruning

m=1|T|i:xiRm(yiy^Rm)2+α|T|

  • |T| indicates the number of terminal nodes of the tree T
  • Rm is the box (the subset of the predictor space) corresponding to the mth terminal node
4 / 7

Pruning

m=1|T|i:xiRm(yiy^Rm)2+α|T|

  • |T| indicates the number of terminal nodes of the tree T
  • Rm is the box (the subset of the predictor space) corresponding to the mth terminal node
  • y^Rm is the mean of the training observations in Rm
4 / 7

Choosing the best subtree

  • The tuning parameter, α, controls the trade-off between the subtree's complexity and its fit to the training data
5 / 7

Choosing the best subtree

  • The tuning parameter, α, controls the trade-off between the subtree's complexity and its fit to the training data

    How do you think you could select α?

5 / 7

Choosing the best subtree

  • The tuning parameter, α, controls the trade-off between the subtree's complexity and its fit to the training data

    How do you think you could select α?

  • You can select an optimal value, α^ using cross-validation!
5 / 7

Choosing the best subtree

  • The tuning parameter, α, controls the trade-off between the subtree's complexity and its fit to the training data

    How do you think you could select α?

  • You can select an optimal value, α^ using cross-validation!
  • Then return to the full dataset and obtain the subtree using α^
5 / 7

Summary regression tree algorithm

  • Use recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
6 / 7

Summary regression tree algorithm

  • Use recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
  • Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of α
6 / 7

Summary regression tree algorithm

  • Use recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
  • Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of α
  • Use K-fold cross-validation to choose α. Pick α to minimize the average error
6 / 7

Summary regression tree algorithm

  • Use recursive binary splitting to grow a large tree on the training data, stop when you reach some stopping criteria
  • Apply cost complexity pruning to the larger tree to obtain a sequence of best subtrees, as a function of α
  • Use K-fold cross-validation to choose α. Pick α to minimize the average error
  • Return the subtree that corresponds to the chosen α
6 / 7
7 / 7

Decision tree predictions

  • Predict the response for a test observation using the mean of the training observations in the region that the test observation belongs to
2 / 7
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow