How to do it?:
Open the Rmarkdown file of this assignment (link) in Rstudio.
Right under each question, insert a code chunk
(you can use the hotkey Ctrl + Alt + I
to add a code chunk)
and code the solution for the question.
Knit
the rmarkdown file (hotkey:
Ctrl + Alt + K
) to export an html.
Publish the html file to your Githiub Page.
Submission: Submit the link on Github of the assignment to Canvas under Assignment 5 - Extra Credits.
Download the c2015 dataset to your computer at this link. Load the library
readxl
(library(readxl)) then use the function
read_excel()
to read the c2015 dataset. The data is from
Fatality Analysis Reporting System (FARS). The data includes vital
accidents information, such as when, where, and how the accident
happened. FARS also includes the drivers and passengers’ information,
such as age,gender etc. Some of the fatal accident had multiple vehicles
involved. More information about FARS can be found at: https://www.nhtsa.gov/research-data/fatality-analysis-reporting-system-fars
Let’s study the variable SEX
. How many missing
values in the NAs form?
Still with variable SEX
. There are missing values in
this variables that are not NAs
. Identify the forms of
missing values in this variable. Change all the forms of missing values
to NAs
.
Still with variable SEX
. After all the missing
values are in the NAs
form. Change the missing values of
this variable to the majority sex.
Let’s study variable AGE
. Use the table
function to check out the values of these variable and forms of missing
values. Use na_if
to change all the forms of missing values
to NAs
.
Still with variable AGE
. Use the
str_replace
to replace Less than 1
to ‘0’
(character 0, not number 0).
Still with variable AGE
. Use the class
function to check the type of this variable. Use the
as.numeric
function to change the type of the variable to
numeric.
Still with variable AGE
. Replace the missing values
NAs
by the mean of the variable. `
Let’s fix the variable TRAV_SP
. Do the
follows.
table
function to check all the values of this
variable. Use the str_remove
to remove the MPH
in each value.Greater
str_replace
function to replace
Stopped
by ‘0’ (dont forget the quotation mark around
0)na_if
to change all the forms of missing values to
NAs
class
. Use
as.numeric
to change the type to numeric.replace_na
to replace the NAs
to the
median of the variable.TRAV_SP
). Hint: You want to look at the seat positions
(SEAT_POS
variable) to filter out the observations about
the drivers, then calculate the correlation.