Six Main Verbs
Brianna Heggeseth
This week in MSCS
Looking Ahead
Getting the data in the tidy format that we want…
select
, mutate
, filter
, arrange
, summarize
, group_by
lubridate
functionsVerbs that change the variables (columns) but not the cases (rows)
Verbs that change the cases (rows) but not the variables (columns)
Grouped summaries
Verbs that change the variables (columns) but not the cases (rows)
Verbs that change the cases (rows) but not the variables (columns)
Grouped summaries
Download a template .Rmd of this activity. Put the file in a Day_07
folder within your COMP_STAT_112
folder.
The data table Birthdays
in the mosaicData
package gives the number of births recorded on each day of the year in each state from 1969 to 1988.
Consider the Birthdays
data
Birthdays
data: one that has only the last two digits of the year, and one that states whether there were more than 100 births in the given state on the given date.select(Birthdays, ends_with("te"))
?Create a table with only births in Massachusetts in 1979, and sort the days from those with the most births to those with the fewest.
MABirths1979 <- filter(Birthdays, state == "MA", year == 1979)
MABirths1979Sorted <- arrange(MABirths1979, desc(births))
head(MABirths1979Sorted)
state date year births
1 MA 1979-09-28 1979 262
2 MA 1979-09-11 1979 252
3 MA 1979-12-28 1979 249
4 MA 1979-09-26 1979 246
5 MA 1979-07-24 1979 245
6 MA 1979-04-27 1979 243
Consider the Birthdays
data again.
# A tibble: 20 × 2
year average
<int> <dbl>
1 1969 192.
2 1970 200.
3 1971 191.
4 1972 175.
5 1973 169.
6 1974 170.
7 1975 169.
8 1976 170.
9 1977 179.
10 1978 179.
11 1979 188.
12 1980 194.
13 1981 195.
14 1982 198.
15 1983 196.
16 1984 197.
17 1985 202.
18 1986 202.
19 1987 205.
20 1988 210.
BirthdaysYearState <- group_by(Birthdays, year, state)
summarise(BirthdaysYearState, average = mean(births))
# A tibble: 1,020 × 3
# Groups: year [20]
year state average
<int> <chr> <dbl>
1 1969 AK 18.6
2 1969 AL 174.
3 1969 AR 91.3
4 1969 AZ 93.3
5 1969 CA 954.
6 1969 CO 110.
7 1969 CT 134.
8 1969 DC 75.3
9 1969 DE 27.6
10 1969 FL 292.
# … with 1,010 more rows
With the pipe notation, x %>% f(y)
becomes f(x,y)
, where in the first line here, x
is Birthdays
, the function f
is filter
, and y
is state == "MA", year == 1979
.
Make a table showing the five states with the most births between September 9, 1979 and September 11, 1979, inclusive. Arrange the table in descending order of births.
Continue working on the activity; check in with your classmates.
Don’t leave anyone left struggling alone!
This activity is all code, no interpretations.
There are many exercises to give you plenty of practice with these important six tasks!
You’ll finish the activity for Assignment 6.