+ - 0:00:00
Notes for current slide
Notes for next slide

Variable Importance

Dr. D’Agostino McGowan

1 / 9

Variable importance

  • For bagged or random forest regression trees, we can record the total RSS that is decreased due to splits of a given predictor Xi averaged over all B trees
2 / 9

Variable importance

  • For bagged or random forest regression trees, we can record the total RSS that is decreased due to splits of a given predictor Xi averaged over all B trees
  • A large value would indicate that that variable is important
2 / 9

Variable importance

  • For bagged or random forest classification trees we can add up the total amount that the Gini Index is decreased by splits of a given predictor, Xi, averaged over B trees
3 / 9

Variable importance in R

rf_spec <- rand_forest(
mode = "classification",
mtry = 3
) %>%
set_engine(
"ranger",
importance = "impurity")
model <- fit(rf_spec,
HD ~ Age + Sex + ChestPain + RestBP + Chol + Fbs +
RestECG + MaxHR + ExAng + Oldpeak + Slope + Ca + Thal,
data = heart)
ranger::importance(model$fit)
## Age Sex ChestPain RestBP Chol Fbs RestECG
## 9.1131749 3.9700559 16.9967476 7.1194992 7.1286452 0.7984602 1.6766312
## MaxHR ExAng Oldpeak Slope Ca Thal
## 13.6944596 5.8784571 13.0535972 5.6513815 17.7194145 14.7356179
4 / 9

Variable importance

library(ranger)
importance(model$fit)
## Age Sex ChestPain RestBP Chol Fbs RestECG
## 9.1131749 3.9700559 16.9967476 7.1194992 7.1286452 0.7984602 1.6766312
## MaxHR ExAng Oldpeak Slope Ca Thal
## 13.6944596 5.8784571 13.0535972 5.6513815 17.7194145 14.7356179
5 / 9

Variable importance

library(ranger)
importance(model$fit)
## Age Sex ChestPain RestBP Chol Fbs RestECG
## 9.1131749 3.9700559 16.9967476 7.1194992 7.1286452 0.7984602 1.6766312
## MaxHR ExAng Oldpeak Slope Ca Thal
## 13.6944596 5.8784571 13.0535972 5.6513815 17.7194145 14.7356179
var_imp <- ranger::importance(model$fit)
5 / 9

Plotting variable importance

var_imp_df <- data.frame(
variable = names(var_imp),
importance = var_imp
)
var_imp_df %>%
ggplot(aes(x = variable, y = importance)) +
geom_col()

6 / 9

Plotting variable importance

var_imp_df <- data.frame(
variable = names(var_imp),
importance = var_imp
)
var_imp_df %>%
ggplot(aes(x = variable, y = importance)) +
geom_col()

How could we make this plot better?

6 / 9

Plotting variable importance

var_imp_df %>%
ggplot(aes(x = variable, y = importance)) +
geom_col() +
coord_flip()

How could we make this plot better?

7 / 9

Plotting variable importance

var_imp_df %>%
mutate(variable = factor(variable,
levels = variable[order(var_imp_df$importance)])) %>%
ggplot(aes(x = variable, y = importance)) +
geom_col() +
coord_flip()

8 / 9
9 / 9

Variable importance

  • For bagged or random forest regression trees, we can record the total RSS that is decreased due to splits of a given predictor Xi averaged over all B trees
2 / 9
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow