NC bike crashes
appex-02-nc-bike-crashes
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Characteristics of tidy data:
Happy families are all alike; every unhappy family is unhappy in its own way.
Leo Tolstoy
Characteristics of tidy data:
Characteristics of untidy data:
!@#$%^&*()
The pipe operator is implemented in the package magrittr, it's pronounced "and then".
park(drive(start_car(find("keys")), to = "campus"))
find("keys") %>% start_car() %>% drive(to = "campus") %>% park()
To send results to a function argument other than first one or to use the previous result for multiple arguments, use .
:
starwars %>% filter(species == "Human") %>% lm(mass ~ height, data = .)
## ## Call:## lm(formula = mass ~ height, data = .)## ## Coefficients:## (Intercept) height ## -116.58 1.11
The dataset is in the dsbox package:
library(dsbox)ncbikecrash
View the names of variables via
names(ncbikecrash)
## [1] "object_id" "city" "county" ## [4] "region" "development" "locality" ## [7] "on_road" "rural_urban" "speed_limit" ## [10] "traffic_control" "weather" "workzone" ## [13] "bike_age" "bike_age_group" "bike_alcohol" ## [16] "bike_alcohol_drugs" "bike_direction" "bike_injury" ## [19] "bike_position" "bike_race" "bike_sex" ## [22] "driver_age" "driver_age_group" "driver_alcohol" ## [25] "driver_alcohol_drugs" "driver_est_speed" "driver_injury" ## [28] "driver_race" "driver_sex" "driver_vehicle_type" ## [31] "crash_alcohol" "crash_date" "crash_day" ## [34] "crash_group" "crash_hour" "crash_location" ## [37] "crash_month" "crash_severity" "crash_time" ## [40] "crash_type" "crash_year" "ambulance_req" ## [43] "hit_run" "light_condition" "road_character" ## [46] "road_class" "road_condition" "road_configuration" ## [49] "road_defects" "road_feature" "road_surface" ## [52] "num_bikes_ai" "num_bikes_bi" "num_bikes_ci" ## [55] "num_bikes_ki" "num_bikes_no" "num_bikes_to" ## [58] "num_bikes_ui" "num_lanes" "num_units" ## [61] "distance_mi_from" "frm_road" "rte_invd_cd" ## [64] "towrd_road" "geo_point" "geo_shape"
See detailed descriptions with ?ncbikecrash
.
data(ncbikecrash)
, click on the name of the data frame to view it in the data viewerglimpse
function to take a peekglimpse(ncbikecrash)
## Observations: 7,467## Variables: 66## $ object_id <int> 1686, 1674, 1673, 1687, 1653, 1665, 1642, 1675, …## $ city <chr> "None - Rural Crash", "Henderson", "None - Rural…## $ county <chr> "Wayne", "Vance", "Lincoln", "Columbus", "New Ha…## $ region <chr> "Coastal", "Piedmont", "Piedmont", "Coastal", "C…## $ development <chr> "Farms, Woods, Pastures", "Residential", "Farms,…## $ locality <chr> "Rural (<30% Developed)", "Mixed (30% To 70% Dev…## $ on_road <chr> "SR 1915", "NICHOLAS ST", "US 321", "W BURKHEAD …## $ rural_urban <chr> "Rural", "Urban", "Rural", "Urban", "Urban", "Ru…## $ speed_limit <chr> "50 - 55 MPH", "30 - 35 MPH", "50 - 55 MPH", …## $ traffic_control <chr> "No Control Present", "Stop Sign", "Double Yello…## $ weather <chr> "Clear", "Clear", "Clear", "Rain", "Clear", "Clo…## $ workzone <chr> "No", "No", "No", "No", "No", "No", "No", "No", …## $ bike_age <chr> "52", "66", "33", "52", "22", "15", "41", "14", …## $ bike_age_group <chr> "50-59", "60-69", "30-39", "50-59", "20-24", "11…## $ bike_alcohol <chr> "No", "No", "No", "Yes", "No", "No", "No", "No",…## $ bike_alcohol_drugs <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …## $ bike_direction <chr> "With Traffic", "With Traffic", "With Traffic", …## $ bike_injury <chr> "B: Evident Injury", "C: Possible Injury", "C: P…## $ bike_position <chr> "Bike Lane / Paved Shoulder", "Travel Lane", "Tr…## $ bike_race <chr> "Black", "Black", "White", "Black", "White", "Na…## $ bike_sex <chr> "Male", "Male", "Male", "Male", "Female", "Male"…## $ driver_age <chr> "34", NA, "37", "55", "25", "17", NA, "50", "32"…## $ driver_age_group <chr> "30-39", NA, "30-39", "50-59", "25-29", "0-19", …## $ driver_alcohol <chr> "No", "Missing", "No", "No", "No", "No", "Missin…## $ driver_alcohol_drugs <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …## $ driver_est_speed <chr> "51-55 mph", "6-10 mph", "41-45 mph", "11-15 mph…## $ driver_injury <chr> "O: No Injury", "Unknown Injury", "O: No Injury"…## $ driver_race <chr> "White", "Unknown/Missing", "Hispanic", "Black",…## $ driver_sex <chr> "Male", NA, "Female", "Male", "Male", "Female", …## $ driver_vehicle_type <chr> "Single Unit Truck (2-Axle, 6-Tire)", NA, "Passe…## $ crash_alcohol <chr> "No", "No", "No", "Yes", "No", "No", "No", "No",…## $ crash_date <chr> "11DEC2013", "20NOV2013", "03NOV2013", "14DEC201…## $ crash_day <chr> "Wednesday", "Wednesday", "Sunday", "Saturday", …## $ crash_group <chr> "Motorist Overtaking Bicyclist", "Bicyclist Fail…## $ crash_hour <int> 6, 20, 18, 18, 13, 17, 17, 7, 15, 2, 12, 22, 12,…## $ crash_location <chr> "Non-Intersection", "Intersection", "Non-Interse…## $ crash_month <chr> "December", "November", "November", "December", …## $ crash_severity <chr> "B: Evident Injury", "C: Possible Injury", "C: P…## $ crash_time <time> 06:10:00, 20:41:00, 18:05:00, 18:34:00, 13:27:0…## $ crash_type <chr> "Motorist Overtaking - Undetected Bicyclist", "B…## $ crash_year <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, …## $ ambulance_req <chr> "Yes", "No", "Yes", "Yes", "Yes", "Yes", "Yes", …## $ hit_run <chr> "No", "Yes", "No", "No", "No", "No", "Yes", "No"…## $ light_condition <chr> "Dark - Roadway Not Lighted", NA, "Dark - Roadwa…## $ road_character <chr> "Straight - Level", "Straight - Level", "Straigh…## $ road_class <chr> "State Secondary Route", "Local Street", "US Rou…## $ road_condition <chr> "Dry", "Dry", "Dry", "Water (Standing, Moving)",…## $ road_configuration <chr> "Two-Way, Not Divided", "Two-Way, Divided, Unpro…## $ road_defects <chr> "None", NA, "None", "None", "None", "None", "Non…## $ road_feature <chr> "No Special Feature", "T-Intersection", "No Spec…## $ road_surface <chr> "Coarse Asphalt", "Smooth Asphalt", "Smooth Asph…## $ num_bikes_ai <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_bi <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_ci <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_ki <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_no <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_to <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_bikes_ui <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ num_lanes <chr> "2 lanes", "2 lanes", "2 lanes", "1 lane", "8 la…## $ num_units <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …## $ distance_mi_from <chr> "0", "0", "0", "0", "0", "0", "0", "0", "0", "0"…## $ frm_road <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …## $ rte_invd_cd <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …## $ towrd_road <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …## $ geo_point <chr> "35.3336070056, -77.9955023901", "36.3151872016,…## $ geo_shape <chr> "{\"type\": \"Point\", \"coordinates\": [-77.995…
dplyr is based on the concepts of functions as verbs that manipulate data frames.
filter
: pick rows matching criteriaslice
: pick rows using index(es)select
: pick columns by namepull
: grab a column as a vectorarrange
: reorder rowsmutate
: add new variablesdistinct
: filter for unique rowssample_n
/ sample_frac
: randomly sample rowssummarise
: reduce variables to values%>%
operator in dplyr functions is called the pipe operator. This means you "pipe" the output of the previous line of code as the first input of the next line of code.%>%
operator in dplyr functions is called the pipe operator. This means you "pipe" the output of the previous line of code as the first input of the next line of code.+
operator in ggplot2 functions is used for "layering". This means you create the plot in layers, separated by +
.filter
to select a subset of rowsfor crashes in Durham County
ncbikecrash %>% filter(county == "Durham")
## # A tibble: 340 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 2452 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 2 2441 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 3 2466 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 4 549 Durh… Durham Piedm… Residential Urban (… PARK A… Urban ## 5 598 Durh… Durham Piedm… Residential Urban (… BELT S… Urban ## 6 603 Durh… Durham Piedm… Residential Urban (… HINSON… Urban ## 7 3974 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 8 7134 Durh… Durham Piedm… Commercial Urban (… <NA> Urban ## 9 1670 Durh… Durham Piedm… Commercial Urban (… INFINI… Urban ## 10 1773 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## # … with 330 more rows, and 58 more variables: speed_limit <chr>,## # traffic_control <chr>, weather <chr>, workzone <chr>, bike_age <chr>,## # bike_age_group <chr>, bike_alcohol <chr>, bike_alcohol_drugs <chr>,## # bike_direction <chr>, bike_injury <chr>, bike_position <chr>,## # bike_race <chr>, bike_sex <chr>, driver_age <chr>, driver_age_group <chr>,## # driver_alcohol <chr>, driver_alcohol_drugs <chr>, driver_est_speed <chr>,## # driver_injury <chr>, driver_race <chr>, driver_sex <chr>,## # driver_vehicle_type <chr>, crash_alcohol <chr>, crash_date <chr>,## # crash_day <chr>, crash_group <chr>, crash_hour <int>, crash_location <chr>,## # crash_month <chr>, crash_severity <chr>, crash_time <time>,## # crash_type <chr>, crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>, frm_road <chr>,## # rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>, geo_shape <chr>
filter
for many conditions at oncefor crashes in Durham County where biker was 0-5 years old
ncbikecrash %>% filter(county == "Durham", bike_age_group == "0-5")
## # A tibble: 4 x 66## object_id city county region development locality on_road rural_urban## <int> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 4062 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 2 414 Durh… Durham Piedm… Residential Urban (… PVA 90… Urban ## 3 3016 Durh… Durham Piedm… Residential Urban (… <NA> Urban ## 4 1383 Durh… Durham Piedm… Residential Urban (… PVA 62… Urban ## # … with 58 more variables: speed_limit <chr>, traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>, bike_sex <chr>,## # driver_age <chr>, driver_age_group <chr>, driver_alcohol <chr>,## # driver_alcohol_drugs <chr>, driver_est_speed <chr>, driver_injury <chr>,## # driver_race <chr>, driver_sex <chr>, driver_vehicle_type <chr>,## # crash_alcohol <chr>, crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <time>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>, frm_road <chr>,## # rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>, geo_shape <chr>
operator | definition | operator | definition | |
---|---|---|---|---|
< |
less than | x | y |
x OR y |
|
<= |
less than or equal to | is.na(x) |
test if x is NA |
|
> |
greater than | !is.na(x) |
test if x is not NA |
|
>= |
greater than or equal to | x %in% y |
test if x is in y |
|
== |
exactly equal to | !(x %in% y) |
test if x is not in y |
|
!= |
not equal to | !x |
not x |
|
x & y |
x AND y |
select
to keep variablesncbikecrash %>% filter(county == "Durham", bike_age_group == "0-5") %>% select(locality, speed_limit)
## # A tibble: 4 x 2## locality speed_limit ## <chr> <chr> ## 1 Urban (>70% Developed) 30 - 35 MPH## 2 Urban (>70% Developed) 5 - 15 MPH ## 3 Urban (>70% Developed) 20 - 25 MPH## 4 Urban (>70% Developed) 20 - 25 MPH
select
to exclude variablesncbikecrash %>% select(-object_id)
## # A tibble: 7,467 x 65## city county region development locality on_road rural_urban speed_limit## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> ## 1 None… Wayne Coast… Farms, Woo… Rural (… SR 1915 Rural 50 - 55 M…## 2 Hend… Vance Piedm… Residential Mixed (… NICHOL… Urban 30 - 35 M…## 3 None… Linco… Piedm… Farms, Woo… Rural (… US 321 Rural 50 - 55 M…## 4 Whit… Colum… Coast… Commercial Urban (… W BURK… Urban 30 - 35 M…## 5 Wilm… New H… Coast… Residential Urban (… RACINE… Urban <NA> ## 6 None… Robes… Coast… Farms, Woo… Rural (… SR 1513 Rural 50 - 55 M…## 7 None… Richm… Piedm… Residential Mixed (… SR 1903 Rural 30 - 35 M…## 8 Rale… Wake Piedm… Commercial Urban (… PERSON… Urban 30 - 35 M…## 9 Whit… Colum… Coast… Residential Rural (… FLOWER… Urban 30 - 35 M…## 10 New … Craven Coast… Residential Urban (… SUTTON… Urban 20 - 25 M…## # … with 7,457 more rows, and 57 more variables: traffic_control <chr>,## # weather <chr>, workzone <chr>, bike_age <chr>, bike_age_group <chr>,## # bike_alcohol <chr>, bike_alcohol_drugs <chr>, bike_direction <chr>,## # bike_injury <chr>, bike_position <chr>, bike_race <chr>, bike_sex <chr>,## # driver_age <chr>, driver_age_group <chr>, driver_alcohol <chr>,## # driver_alcohol_drugs <chr>, driver_est_speed <chr>, driver_injury <chr>,## # driver_race <chr>, driver_sex <chr>, driver_vehicle_type <chr>,## # crash_alcohol <chr>, crash_date <chr>, crash_day <chr>, crash_group <chr>,## # crash_hour <int>, crash_location <chr>, crash_month <chr>,## # crash_severity <chr>, crash_time <time>, crash_type <chr>,## # crash_year <int>, ambulance_req <chr>, hit_run <chr>,## # light_condition <chr>, road_character <chr>, road_class <chr>,## # road_condition <chr>, road_configuration <chr>, road_defects <chr>,## # road_feature <chr>, road_surface <chr>, num_bikes_ai <int>,## # num_bikes_bi <int>, num_bikes_ci <int>, num_bikes_ki <int>,## # num_bikes_no <int>, num_bikes_to <int>, num_bikes_ui <int>,## # num_lanes <chr>, num_units <int>, distance_mi_from <chr>, frm_road <chr>,## # rte_invd_cd <int>, towrd_road <chr>, geo_point <chr>, geo_shape <chr>
select
a range of variablesncbikecrash %>% select(city:locality)
## # A tibble: 7,467 x 5## city county region development locality ## <chr> <chr> <chr> <chr> <chr> ## 1 None - Rural C… Wayne Coastal Farms, Woods, Pas… Rural (<30% Developed)## 2 Henderson Vance Piedmont Residential Mixed (30% To 70% Dev…## 3 None - Rural C… Lincoln Piedmont Farms, Woods, Pas… Rural (<30% Developed)## 4 Whiteville Columbus Coastal Commercial Urban (>70% Developed)## 5 Wilmington New Hanov… Coastal Residential Urban (>70% Developed)## 6 None - Rural C… Robeson Coastal Farms, Woods, Pas… Rural (<30% Developed)## 7 None - Rural C… Richmond Piedmont Residential Mixed (30% To 70% Dev…## 8 Raleigh Wake Piedmont Commercial Urban (>70% Developed)## 9 Whiteville Columbus Coastal Residential Rural (<30% Developed)## 10 New Bern Craven Coastal Residential Urban (>70% Developed)## # … with 7,457 more rows
summarize
to reduce variables to valuesncbikecrash %>% summarize(avg_hr = mean(crash_hour))
## # A tibble: 1 x 1## avg_hr## <dbl>## 1 14.7
group_by
to do calculations on groupsncbikecrash %>% group_by(hit_run) %>% summarise(avg_hr = mean(crash_hour))
## # A tibble: 2 x 2## hit_run avg_hr## <chr> <dbl>## 1 No 14.6## 2 Yes 15.0
mutate
to add new variablesncbikecrash %>% mutate(weather_clear = case_when( weather == "Clear" ~ "Yes", TRUE ~ "No" ))
mutate
Most often when you define a new variable with mutate
you'll also want to save the resulting data frame, often by writing over the original data frame.
ncbikecrash <- ncbikecrash %>% mutate(weather_clear = case_when( weather == "Clear" ~ "Yes", TRUE ~ "No" ))
mutate
Most often when you define a new variable with mutate
you'll also want to save the resulting data frame, often by writing over the original data frame.
ncbikecrash %>% mutate(weather_clear = case_when( weather == "Clear" ~ "Yes", TRUE ~ "No" )) -> ncbikecrash
ncbikecrash %>% count(weather_clear, weather)
## # A tibble: 6 x 3## weather_clear weather n## <chr> <chr> <int>## 1 No Cloudy 951## 2 No Fog, Smog, Smoke 17## 3 No Other 15## 4 No Rain 295## 5 No Snow, Sleet, Hail, Freezing Rain/Drizzle 8## 6 Yes Clear 6181
NC bike crashes
appex-02-nc-bike-crashes
NC bike crashes
appex-02-nc-bike-crashes
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |