Reshaping Data

Brianna Heggeseth

Announcements

Sit with someone new today! Introduce yourself.

Upcoming in MSCS

  • MSCS Coffee Break: Thursday at 11:15am (special invitation to First Years)

Due this Week

  • Assignment 4 (Spatial Viz) due tomorrow, Wednesday [via Moodle]

  • Assignment 5 (Six Main Verbs) next Wednesday [via Moodle]

  • At least 1 Tidy Tuesday (TT) by next Friday [via Moodle, TT4 or TT5]

  • Thursday: Catch up Day!

Learning Goals

  • Understand the difference between wide and long data format and distinguish the case (unit of observation) for a given data set
  • Develop comfort in using pivot_wider and pivot_longer in the tidyr package

Describe to your Neighbor

Everyone: Look at this small data set.

Partner A: Describe the structure of the data set in words.

name sex total
Courtney F 257289
Courtney M 22619
Riley F 100881
Riley M 92789
Sarah F 1073895
Sarah M 3320

Describe to your Neighbor

Partner A: Close your eyes.

Partner B: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner A: Sketch what you think the new data set looks like.

name F M
Courtney 257289 22619
Riley 100881 92789
Sarah 1073895 3320

Describe to your Neighbor

Partner B: Close your eyes.

Partner A: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner B: Sketch what you think the new data set looks like.

name F M ratio
Courtney 257289 22619 0.0879128
Riley 100881 92789 0.9197867
Sarah 1073895 3320 0.0030915

and then …

name ratio sex total
Courtney 0.0879128 F 257289
Courtney 0.0879128 M 22619
Riley 0.9197867 F 100881
Riley 0.9197867 M 92789
Sarah 0.0030915 F 1073895
Sarah 0.0030915 M 3320

Wider V. Longer Format

If we want to retain all of the values in the data set (no summaries or combinations) but have a different unit of observation (or case), we can:

  • Make the data wider by spreading out the values across new variables (e.g. total counts for binary “Male” and “Female” names)
  • Make the data longer by combining values from different variables into 1 variable (take counts for binary “Male” and “Female” names and combine into one total column)

R Functions

pivot_wider()

  • Inputs: data, names_from = var_nameA, values_from = var_nameB

pivot_longer()

  • Inputs: data, cols = c(var_name1, var_name2), names_to = “newvarname_names”, values_to = “newvarname_values”

R Functions

Taken from tidyr cheatsheet

Template File

Download a template .Rmd of this activity. Put the file in a Assignment_06 folder within your COMP_STAT_112 folder.

  • This .Rmd contains 3 exercises (1 of which we’ll do in class) you’ll finish for Assignment 6.

In Class

  • Let’s try working on the first Reshaping exercise together.

  • Then you can choose to work on Spatial Viz or Six Main Verbs or Reshaping…

After Class

  • Work on the 2 Spatial Viz exercises to turn in for Assignment 4.
  • Work on the 12 Six Main Verb exercises to turn in for Assignment 5.
  • Work on the 3 exercises to turn in for Assignment 6.
  • Tidy Tuesday 4 is posted in Moodle (due Friday).