class: center, middle, inverse, title-slide # Midterm 02 Review ### Dr. D’Agostino McGowan --- layout: true <div class="my-footer"> <span> Dr. Lucy D'Agostino McGowan </span> </div> --- ## Ridge Review .question[ What are we minimizing with Ridge Regression? ] -- `$$RSS + \lambda\sum_{j=1}^p\beta_j^2$$` -- .question[ What is the resulting estimate for `\(\hat\beta_{ridge}\)`? ] -- `$$\hat\beta_{ridge} = (\mathbf{X}^{T}\mathbf{X}+\lambda\mathbf{I})^{-1}\mathbf{X}^T\mathbf{y}$$` -- .question[ Why is this useful? ] --- class: inverse
05
:
00
## <i class="fas fa-edit"></i> `Derive the` `\(\beta_{ridge}\)` `values` Practice deriving the `\(\beta\)` coefficients by minimizing `\(RSS + \lambda\sum_{j=1}^p\beta_j^2\)`. Be sure you understand what each step is doing. --- ## Answer * Start by FOILing the thing we're minimizing (Don't remember how do to that? Check out [Slide #8 from the Ridge Regression lecture](https://sta-363-s20.lucymcgowan.com/slides/11-ridge.html#16)) -- * Take the derivative, set it equal to 0 (Don't remember how do to that? Check out [Slide #10 from the Ridge Regression lecture](https://sta-363-s20.lucymcgowan.com/slides/11-ridge.html#20)) -- * Solve for `\(\beta\)` (Don't remember how do to that? Check out [Slide #17 from the Ridge Regression lecture](https://sta-363-s20.lucymcgowan.com/slides/11-ridge.html#29)) --- ## Ridge Review .question[ How is `\(\lambda\)` determined? ] `$$RSS + \lambda\sum_{j=1}^p\beta_j^2$$` -- .question[ What is the bias-variance trade-off? ] --- ## Ridge Regression .pull-left[ ## Pros * Can be used when `\(p > n\)` * Can be used to help with multicollinearity * Will decrease variance (as `\(\lambda \rightarrow \infty\)` ) ] -- .pull-right[ ## Cons * Will have increased bias (compared to least squares) * Does not really help with variable selection (all variables are included in _some_ regard, even if their `\(\beta\)` coefficients are really small) ] --- ## Lasso! * The lasso is similar to ridge, but it actually drives some `\(\beta\)` coefficients to 0! (So it helps with variable selection) -- `$$RSS + \lambda\sum_{j=1}^p|\beta_j|$$` --- ## Lasso .pull-left[ ## Pros * Can be used when `\(p > n\)` * Can be used to help with multicollinearity * Will decrease variance (as `\(\lambda \rightarrow \infty\)` ) * Can be used for variable selection, since it will make some `\(\beta\)` coefficients exactly 0 ] -- .pull-right[ ## Cons * Will have increased bias (compared to least squares) * If `\(p>n\)` the lasso can select **at most** `\(n\)` variables ] --- ## What if we want to do both? * Elastic net! -- `$$RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|$$` --- ## Elastic net `$$RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|$$` .question[ When will this be equivalent to Ridge Regression? ] --- ## Elastic net `$$RSS + \lambda_1\sum_{j=1}^p\beta^2_j+\lambda_2\sum_{j=1}^p|\beta_j|$$` .question[ When will this be equivalent to Lasso? ] --- class: center, middle ## Nonlinear models --- class: inverse
03
:
00
## <i class="fas fa-laptop"></i> `Polynomial Regression` `$$pop = \beta_0 + \beta_1age + \beta_2age^2 + \beta_3age^3 + \epsilon$$` Using the information below, write out the equation to predict change in population from a change in age from the 25th percentile (24.5) to a 75th percentile (73.5). |term | estimate| std.error| statistic| p.value| |:-----------|---------:|---------:|---------:|-------:| |(Intercept) | 1807.8528| 56.1241| 32.2117| 0.0000| |age | -39.6783| 4.9849| -7.9596| 0.0000| |I(age^2) | 0.2064| 0.1185| 1.7414| 0.0849| |I(age^3) | 0.0001| 0.0008| 0.1869| 0.8522| --- class: inverse
03
:
00
## <i class="fas fa-edit"></i> `Nonlinear models` What is the difference between: * Polynomial regression * Linear Spline * Cubic Spline * Natural Spline --- class: center, middle ## Degrees of freedom --- ## Example A model predicting `mpg` from `horsepower` and `weight`. `$$mpg = \beta_0 + \beta_1 horsepower + \beta_2 weight + \epsilon$$` -- .question[ How many degrees of freedom are used for the `horsepower` variable? ] -- * 1 --- ## Example A model predicting `mpg` from `horsepower` and `weight`. `$$mpg = \beta_0 + \beta_1 horsepower + \beta_2 horsepower^2 + \beta_3 weight + \epsilon$$` -- .question[ How many degrees of freedom are used for the `horsepower` variable? ] -- * 2 --- ## Example A model predicting `mpg` from `horsepower` and `weight`. _cubic spline with 3 knots_ `$$mpg = \beta_0 + \beta_1 horsepower + \beta_2 horsepower^2 + \beta_3 horsepower^3 +\\ \beta_4 b_4(horsepower) + \beta_5 b_5(horsepower) + \beta_6 b_6(horsepower) + \\\beta_7 weight + \epsilon$$` -- .question[ How many degrees of freedom are used for the `horsepower` variable? ] -- * 6 -- _Don't remember what those `\(b_i()\)` are? Review [Non-linear Slide #17](file:///Users/lucymcgowan/wonderland/courses/2020s-sta363/website/static/slides/13-non-linear.html#38)_ --- ## Example A model predicting `mpg` from `horsepower` and `weight`. _**natural** cubic spline with 3 knots_ `$$mpg = \beta_0 + \beta_1 horsepower + \beta_2 r_2(horsepower) + \\\beta_3 r_3(horsepower) + \beta_4 weight + \epsilon$$` * `\(r_i(horsepower)\)` is a function of `\(horespower\)` similar to `\(b_i()\)` from the cubic spline, but slightly different due to the **restriction**, you don't need to know this specification -- .question[ How many degrees of freedom are used for the `horsepower` variable? ] -- * 3 --- ## Other things to review * Make sure you are familiar with how **tidymodels** works * Remember your _matrix facts_