How to do it?:

Submission: Submit the link on Github of the assignment to Canvas under Assignment 4.


1. Install tidyverse package

An R package can be installed by install.packages function. Install tidyverse if you have not done so.

install.packages('tidyverse')

2. Read the data using read_csv

Use read_csv function to import the US Covid 19 data at link. Don’t forget to import tidyverse (library(tidyverse)) so that you can use read_csv.


3. Fix the date and ceate some new variables

lubridate is a package of the tidyverse packages. We will make uses of lubridate in this question.

library(lubridate)
df$month = month(df$date)

# day of the week
df$weekday = wday(df$date)

# day of the month
df$monthday <- mday(df$date)

4. Create new variables with case_when.

The function case_when is a good option to create a new variable from existing variable. For example, this below codes create a new variable, daily_death, from deathIncrease variable. deathIncrease is the number of daily new death by Covid19. The new variable daily_death takes three values: low (if deathIncrease less than 3), medium (deathIncrease from 3 to 14), and high (deathIncrease more than 14). Please notice that this can also be done in a different way as shown in Assignment 3.

df$daily_death <- case_when(
  df$deathIncrease <3 ~ 'low',
  df$deathIncrease <=14 ~ 'medium',
  TRUE ~ 'high'
)

5. Select function

Use the select function to deselect the column totalTestsViral from the data.


6. Pipe Operator ( %>% )

Pipe operator offers another way to write R codes. Many times, it makes the codes more readable. Pipe works very well with all the tidyverse packages. Refer to these slides (slide 15, 16, 17 and 18) to rewrite the below codes using pipe operator

x <- c(1:10)

# square root of x
sqrt(x)

sum(sqrt(x))

log(sum(sqrt(x)))

# log base 2 of 16
log(16, 2)

7. Combo 1: group_by + summarise

This combo is used when you want to apply a function/calculation to different groups of the data. For example, to calculate the average number of cases (positiveIncrease) by dataQualityGrade, we use:

df %>% 
  group_by(weekday) %>% 
  summarise(mean(positiveIncrease))

8. Combo 2: filter + group_by + summarise

An example: to calculate the average number of cases (positiveIncrease) in January and February separately, we use:

df %>% 
  filter(month==1|month==2) %>% 
  group_by(month) %>% 
  summarise(positve_increase = mean(positiveIncrease))

9. Combo 3: filter + group_by + summarise + arrange

Use the arrange function to find a month that has the highest number of deaths on the weekend.


10. Use your own dataset and implement the follows functions or combos. You can use the Adult Census Income or Titanic data.