How to do it?:
Open the Rmarkdown file of this assignment (link) in Rstudio.
Right under each question, insert a code chunk
(you can use the hotkey Ctrl + Alt + I
to add a code chunk)
and code the solution for the question.
Knit
the rmarkdown file (hotkey:
Ctrl + Alt + K
) to export an html.
Publish the html file to your Githiub Page.
Submission: Submit the link on Github of the assignment to Canvas.
The data:
This assignment works with the Adult Census Data that can be downloaded at this link.
Questions
Use read_csv
to import the data. Show the number of
NAs for each columns.
Using function aggr
the VIM package to plot the
number of NAs for each column.
Find other forms of missing values. Hint: You can use the
table
function to check if there are suspicious categories
(Unknown, for example) in a variable. What all other forms of missing
values shown in the data.
Replace all the forms of missing values found to NA
Replot the number of NAs for each column.
Approach 1 to handle NAs: remove all rows that have any NAs. Save the dataset after removing as a different data. The original data is unchanged (still have NAs). How many rows left after removing?
Approach 2 to handle NAs: Fill all the NAs with the previous or
next value. (Hint: Using fill
function). Save the dataset
after filling as a different data. The original data is unchanged (still
have NAs).
Approach 3 to handle NAs: For numeric variable, replace the NAs by the median. For categorical variables, replace the NAs to the majority.