Lab 05 - Nonlinear models

Due: Thursday 2020-03-26 at 11:59pm

Introduction

In this lab we are going to practice fitting nonlinear models. A few reminders:

Getting started

Packages

In this lab we will work with three packages: ISLR for the data, tidyverse which is a collection of packages for doing data analysis in a “tidy” way and tidymodels for statistial modeling.

Now that the necessary packages are installed, you should be able to Knit your document and see the results.

If you’d like to run your code in the Console as well you’ll also need to load the packages there. To do so, run the following in the console.

library(tidyverse) 
library(tidymodels)
library(ISLR)

Note that the packages are also loaded with the same commands in your R Markdown document.

Housekeeping

Git configuration

Your email address is the address tied to your GitHub account and your name should be first and last name.

To confirm that the changes have been implemented, run the following

Password caching

If you would like your git password cached for a week for this project, type the following in the Terminal:

Project name:

Currently your project is called Untitled Project. Update the name of your project to be “Lab 05 - Nonlinear models”.

Warm up

Before we introduce the data, let’s warm up with some simple exercises.

YAML:

Open the R Markdown (Rmd) file in your project, change the author name to your name, and knit the document.

Commiting and pushing changes:

Data

For this lab, we are using Wage data from the ISLR package.

Exercises

  1. Examine the Wage data set from the ISLR package. What are the variables? How many observations are there?

  2. Create a linear model specification, setting the engine to lm. Call this model specification linear_spec.

  3. Create a recipe using the Wage data from the ISLR package. We want to predict the variable wage from age, health_ins, jobclass, education, and race. Fit age using a natural spline. Use tune() to decide how many degrees of freedom to use for the age variable.

  4. Use tune_grid() to fit the linear model specified in Exercise 2 with the recipe created in Exercise 3 using 10-fold cross validation, similar to Lab 04. Choose the model with the lowest RMSE. How many degrees of freedom were used for the age natural spline for this best model? Report the RMSE for this model as well as the chosen degrees of freedom.