Reshaping Data

Brianna Heggeseth

Announcements

Upcoming in MSCS

  • MSCS BIPOC Affinity group meeting Feb 22 7-8:30pm in Smail Gallery
  • Beyond Mac - Ava Cutler ’21; Monday, February 20th 4:45pm-5:45pm (OLRI 241)
  • MSCS Seminar - Jodin Morey; Tuesday, February 21st 11:45am-12:45pm (OLRI 250)

Due this Week

  • Assignment 5 (Six Main Verbs, Reshaping) next Wednesday [via Moodle]
  • At least 1 Tidy Tuesday (TT) by next Friday [via Moodle, TT4 or TT5]

Learning Goals

  • Understand the difference between wide and long data format and distinguish the case (unit of observation) for a given data set
  • Develop comfort in using pivot_wider and pivot_longer in the tidyr package

Template File

Download a template .Rmd of this activity. Put the file in a Assignment_05 folder within your COMP_STAT_112 folder.

  • This .Rmd contains examples that we’ll work on in class and and 3 exercises you’ll finish for Assignment 5.

Describe to your Neighbor

Everyone: Look at this small data set.

Partner A: Describe the structure of the data set in words.

name sex total
Courtney F 257289
Courtney M 22619
Riley F 100881
Riley M 92789
Sarah F 1073895
Sarah M 3320

Describe to your Neighbor

Partner A: Close your eyes.

Partner B: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner A: Sketch what you think the new data set looks like.

name F M
Courtney 257289 22619
Riley 100881 92789
Sarah 1073895 3320

Describe to your Neighbor

Partner B: Close your eyes.

Partner A: Describe the how the structure of the data set changed. Think of describing “steps” taken.

Partner B: Sketch what you think the new data set looks like.

name F M ratio
Courtney 257289 22619 0.0879128
Riley 100881 92789 0.9197867
Sarah 1073895 3320 0.0030915

and then …

name ratio sex total
Courtney 0.0879128 F 257289
Courtney 0.0879128 M 22619
Riley 0.9197867 F 100881
Riley 0.9197867 M 92789
Sarah 0.0030915 F 1073895
Sarah 0.0030915 M 3320

Wider V. Longer Format

If we want to retain all of the values in the data set (no summaries or combinations) but have a different unit of observation (or case), we can:

  • Make the data wider by spreading out the values across new variables (e.g. total counts for binary “Male” and “Female” names)
  • Make the data longer by combining values from different variables into 1 variable (take counts for binary “Male” and “Female” names and combine into one total column)

R Functions

pivot_wider()

  • Inputs: data, names_from = var_nameA, values_from = var_nameB

pivot_longer()

  • Inputs: data, cols = c(var_name1, var_name2), names_to = “newvarname_names”, values_to = “newvarname_values”

R Functions

Taken from tidyr cheatsheet

In Class

Go through the example code in the Rmd file to make sure you understand how we reshape data to be wider and longer.

  • Work together on the 3 exercises to turn in for Assignment 5.

After Class

  • Work on the 3 exercises to turn in for Assignment 5.
  • Tidy Tuesday 4 is posted in Moodle (due Friday).