Data Wrangling
Six Main Verbs
Announcements
MSCS Events
- Thursday 11:15am: Coffee Break!
- What events would you like? Let me know!
Looking Ahead
- Week 5: Data Wrangling
- Week 6: Data Wrangling
- Week 7: Mini Project/Midterm Review
- Week 8: Midterm/Fall Break!
Due next Week
- Assignment 4 (Spatial Viz) on Weds [via Moodle]
- September 2023 Reflection [write in Google Doc shared with you]
- Tidy Tuesday on Friday (at least 1 of TT1-TT5)
Reflection
Let’s take 5 minutes to start that reflection.
- Write under Sept 2023.
- See the prompts at the top of the document. Respond to any combo of them.
- Goal: You are sharing your perspective about your learning that may not be reflected in what you turn in.
- Shared only between you and Brianna (you can/should be vulnerable about struggles/barriers so I can support you)
Learning Goals
- Understand and be able to use the following verbs appropriate:
select
, mutate
, filter
, arrange
, summarize
, group_by
- Develop working knowledge of working with dates and
lubridate
functions
Six Main Verbs
Verbs that change the variables (columns) but not the cases (rows)
Verbs that change the cases (rows) but not the variables (columns)
Grouped summaries
Six Main Verbs
Verbs that change the variables (columns) but not the cases (rows)
- select
- Action: Provides a subset of variables
- Inputs: data, variable names
- Example:
select(data,var1,var2,var3)
- mutate
- Action: creates new variables
- Inputs: data, new_variable_name = how_you_define_new_var
- Examples:
mutate(data, var2 = var^2)
Verbs that change the cases (rows) but not the variables (columns)
- filter
- Action: shows subset of rows
- Inputs: data, Boolean conditions based on variables
- Examples:
filter(data, year > 2000)
- arrange
- Action: sorts rows
- Inputs: data, variable names, desc(variable name) if by descending order
- Examples:
arrange(data, desc(n))
- summarize
- Action: collapses rows and calculates a summary
- Inputs: data, new_variable_name = expression_used_to_summarize
- Example:
summarize(data, avgHeight = mean(height)
- group_by
- Action: creates a grouping structure within data
- Inputs: data, names of variables to define grouping structure
- Example:
data %>% group_by(sport) %>% summarize(avgHeight = mean(height))
WAIT! What is that?
%>%
is called a pipe.
- It serves as a way to “pass” objects (usually datasets) on the left to a function on the right as the 1st input.
LEFT_OBJECT %>% RIGHT_FUNCTION()
is the same as RIGHT_FUNCTION(LEFT_OBJECT)
Template File
Download a template .Rmd of this activity. Put the file in a Assignment_05
folder within your COMP_STAT_112
folder.
- This .Rmd only contains examples that we’ll work on in class and exercises you’ll finish for Assignment 5.
Rest of Class
Continue working on the activity; check in with your classmates.
Don’t leave anyone left struggling alone!
After Class
This activity is all code, no interpretations.
There are 12 exercises to give you plenty of practice with these important six tasks!
You’ll finish the activity for Assignment 5 (due in 1.5 weeks!).