Reshaping Data

Brianna Heggeseth

Announcements

This week in MSCS

  • Wednesday (4:45-7pm): MSCS Block Party! Outside OLRI near Tennis Courts
  • Thursday 11:15am: Coffee Break!

Due this Week

  • Assignment 5 (Spatial Viz) on Tuesday [via Moodle]
  • Self-Reflection on Friday [individual Google Doc]
  • At least 1 Tidy Tuesday (TT) by Friday [via Moodle, TT4]
    • Choose 1 TT to iterate on by Friday [Via Moodle, IV0]

Learning Goals

  • Understand the difference between wide and long data format and distinguish the case (unit of observation) for a given data set
  • Develop comfort in using pivot_wider and pivot_longer in the tidyr package

Template File

Download a template .Rmd of this activity. Put the file in a Day_08 folder within your COMP_STAT_112 folder.

  • This .Rmd only contains examples and 3 exercises that we’ll work on in class and you’ll finish for Assignment 7.

Describe to your Neighbor

Everyone: Look at this small data set.

Partner A: Describe the structure of the data set in words.

name sex total
Courtney F 257289
Courtney M 22619
Riley F 100881
Riley M 92789

Describe to your Neighbor

Partner A: Close your eyes.

Partner B: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner A: Sketch what you think the new data set looks like.

name F M
Courtney 257289 22619
Riley 100881 92789

Describe to your Neighbor

Partner B: Close your eyes.

Partner A: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner B: Sketch what you think the new data set looks like.

name F M ratio
Courtney 257289 22619 0.0879128
Riley 100881 92789 0.9197867
name ratio sex total
Courtney 0.0879128 F 257289
Courtney 0.0879128 M 22619
Riley 0.9197867 F 100881
Riley 0.9197867 M 92789

Wider V. Longer Format

If we want to maintain all of the values in the data set (not collapsing rows with summarize) but have a different unit of observation (or case), we can:

  • Make the data wider by spreading out the values across new variables (e.g. total counts for binary “Male” and “Female” names)
  • Make the data longer by combining values from different variables into 1 variable (take counts for binary “Male” and “Female” names and combine into one total column)

R Functions

pivot_wider()

  • Inputs: data, names_from = var_name, values_from = var_name

pivot_longer()

  • Inputs: data, cols = c(var_name2, var_name2), names_to = “string”, values_to = “string”

R Functions

Taken from tidyr cheatsheet

In Class

Go through the example code in the Rmd file to make sure you understand how we reshape data to be wider and longer.

  • Work together on the 3 exercises to turn in for Assignment 7.

After Class

  • Work on the 3 exercises to turn in for Assignment 7.
  • Tidy Tuesday 4 is posted in Moodle.
  • Write a self-reflection in your individual Google Doc shared with you.
  • Submit a Iterative Viz 0 in Moodle.