+ - 0:00:00
Notes for current slide
Notes for next slide

Decision trees - Intro + Regression trees

Dr. D’Agostino McGowan

1 / 13

Decision trees

  • Can be applied to regression problems
2 / 13

Decision trees

  • Can be applied to regression problems
  • Can be applied to classification problems

What is the difference?

2 / 13

Regression trees

3 / 13

Decision tree - Baseball Salary Example

4 / 13

Decision tree - Baseball Salary Example

How would you stratify this?

4 / 13

Decision tree - Baseball Salary Example

5 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
6 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
  • I fit a regression tree predicting the salary of a baseball player from:
  • Number of years they played in the major leagues
  • Number of hits they made in the previous year
6 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
  • I fit a regression tree predicting the salary of a baseball player from:
  • Number of years they played in the major leagues
  • Number of hits they made in the previous year
  • At each node the label (e.g., Xj<tk ) indicates that the left branch that comes from that split. The right branch is the opposite, e.g. Xjβ‰₯tk.
6 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
  • I fit a regression tree predicting the salary of a baseball player from:
  • Number of years they played in the major leagues
  • Number of hits they made in the previous year
  • At each node the label (e.g., Xj<tk ) indicates that the left branch that comes from that split. The right branch is the opposite, e.g. Xjβ‰₯tk.
  • For example, the first internal node indicates that those to the left have less than 4.5 years in the major league, on the right have β‰₯ 4.5 years.
6 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
  • I fit a regression tree predicting the salary of a baseball player from:
  • Number of years they played in the major leagues
  • Number of hits they made in the previous year
  • At each node the label (e.g., Xj<tk ) indicates that the left branch that comes from that split. The right branch is the opposite, e.g. Xjβ‰₯tk.
  • For example, the first internal node indicates that those to the left have less than 4.5 years in the major league, on the right have β‰₯ 4.5 years.
  • The number on the top of the nodes indicates the predicted Salary, for example before doing any splitting, the average Salary for the whole dataset is 536 thousand dollars.
6 / 13

Let's walk through the figure

  • This is using the Hitters data from the ISLR πŸ“¦
  • I fit a regression tree predicting the salary of a baseball player from:
  • Number of years they played in the major leagues
  • Number of hits they made in the previous year
  • At each node the label (e.g., Xj<tk ) indicates that the left branch that comes from that split. The right branch is the opposite, e.g. Xjβ‰₯tk.
  • For example, the first internal node indicates that those to the left have less than 4.5 years in the major league, on the right have β‰₯ 4.5 years.
  • The number on the top of the nodes indicates the predicted Salary, for example before doing any splitting, the average Salary for the whole dataset is 536 thousand dollars.
  • This tree has two internal nodes and three termninal nodes
6 / 13

Decision tree - Baseball Salary Example

7 / 13

Decision tree - Baseball Salary Example

8 / 13

Decision tree - Baseball Salary Example

9 / 13

Decision tree - Baseball Salary Example

10 / 13

Terminology

πŸŽ‹ The final regions, R1,R2,R3 are called terminal nodes

11 / 13

Terminology

πŸŽ‹ The final regions, R1,R2,R3 are called terminal nodes

πŸŽ„ You can think of the trees as upside down, the leaves are at the bottom

11 / 13

Terminology

πŸŽ‹ The final regions, R1,R2,R3 are called terminal nodes

πŸŽ„ You can think of the trees as upside down, the leaves are at the bottom

πŸŽ‹ The splits are called internal nodes

11 / 13

Interpretation of results

  • Years is the most important factor in determining Salary; players with less experience earn lower salaries
12 / 13

Interpretation of results

  • Years is the most important factor in determining Salary; players with less experience earn lower salaries
  • Given that a player is less experienced, the number of Hits seems to play little role in the Salary
12 / 13

Interpretation of results

  • Years is the most important factor in determining Salary; players with less experience earn lower salaries
  • Given that a player is less experienced, the number of Hits seems to play little role in the Salary
  • Among players who have been in the major leagues for 4.5 years or more, the number of Hits made in the previous year does affect Salary, players with more Hits tend to have higher salaries
12 / 13

Interpretation of results

  • Years is the most important factor in determining Salary; players with less experience earn lower salaries
  • Given that a player is less experienced, the number of Hits seems to play little role in the Salary
  • Among players who have been in the major leagues for 4.5 years or more, the number of Hits made in the previous year does affect Salary, players with more Hits tend to have higher salaries
  • This is probably an oversimplification, but see how easy it is to interpret!
12 / 13
02:00

Interpreting decision trees

  • How many internal nodes does this plot have? How many terminal nodes?
  • What is the average Salary for players who have more than 6.5 years in the major leagues but less than 118 Hits? What % of the dataset fall in this category?

13 / 13

Decision trees

  • Can be applied to regression problems
2 / 13
Paused

Help

Keyboard shortcuts

↑, ←, Pg Up, k Go to previous slide
↓, β†’, Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow