class: center, middle, inverse, title-slide # Variable Importance ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan <i>adapted from slides by Hastie & Tibshirani</i> </span> </div> --- ## Variable importance * For bagged or random forest _regression trees_, we can record the _total RSS_ that is decreased due to splits of a given predictor `\(X_i\)` averaged over all `\(B\)` trees -- * A large value would indicate that that variable is _important_ --- ## Variable importance * For bagged or random forest _classification trees_ we can add up the total amount that the Gini Index is decreased by splits of a given predictor, `\(X_i\)`, averaged over `\(B\)` trees --- ## Variable importance in R .small[ ```r rf_spec <- rand_forest( mode = "classification", mtry = 3 ) %>% set_engine( "ranger", * importance = "impurity") model <- fit(rf_spec, HD ~ Age + Sex + ChestPain + RestBP + Chol + Fbs + RestECG + MaxHR + ExAng + Oldpeak + Slope + Ca + Thal, data = heart) ``` ] ```r ranger::importance(model$fit) ``` ``` ## Age Sex ChestPain RestBP Chol Fbs RestECG ## 9.1131749 3.9700559 16.9967476 7.1194992 7.1286452 0.7984602 1.6766312 ## MaxHR ExAng Oldpeak Slope Ca Thal ## 13.6944596 5.8784571 13.0535972 5.6513815 17.7194145 14.7356179 ``` --- ## Variable importance .small[ ```r library(ranger) importance(model$fit) ``` ``` ## Age Sex ChestPain RestBP Chol Fbs RestECG ## 9.1131749 3.9700559 16.9967476 7.1194992 7.1286452 0.7984602 1.6766312 ## MaxHR ExAng Oldpeak Slope Ca Thal ## 13.6944596 5.8784571 13.0535972 5.6513815 17.7194145 14.7356179 ``` ] -- .small[ ```r var_imp <- ranger::importance(model$fit) ``` ] --- ## Plotting variable importance .small[ ```r var_imp_df <- data.frame( variable = names(var_imp), importance = var_imp ) var_imp_df %>% ggplot(aes(x = variable, y = importance)) + geom_col() ``` ![](19-variable-importance_files/figure-html/unnamed-chunk-7-1.png)<!-- --> ] -- .question[ How could we make this plot better? ] --- ## Plotting variable importance .small[ ```r var_imp_df %>% ggplot(aes(x = variable, y = importance)) + geom_col() + coord_flip() ``` ![](19-variable-importance_files/figure-html/unnamed-chunk-8-1.png)<!-- --> ] .question[ How could we make this plot better? ] --- ## Plotting variable importance .small[ ```r var_imp_df %>% mutate(variable = factor(variable, levels = variable[order(var_imp_df$importance)])) %>% ggplot(aes(x = variable, y = importance)) + geom_col() + coord_flip() ``` ![](19-variable-importance_files/figure-html/unnamed-chunk-9-1.png)<!-- --> ] ---