1  Introductions

Settling In

Welcome to Intermediate Data Science!

Sit with a group of 3-4 people that you do not know well.

Introduce yourself

  • Name, pronounciation tips, pronouns
  • Macalester connections (e.g., majors/minors/concentrations, clubs, teams, events regularly attended)
  • How are you feeling about the coming semester?
  • One thing from break that you are proud of!
  • If you could use data to investigate anything, what would it be and why?

Everything on the slides is in the online manual, which you’ll want to have open in class: https://bcheggeseth.github.io/212_spring_2025/

Plan for today

  • Introductions
  • What is this course about?
  • Get to know your classmates
  • Shaping our syllabus together
  • Reflection practice
  • Brief 112 review
  • Warming up our wrangling and visualization skills with Tidy Tuesday!



Introductions

Instructor

Who I am

Prof. Brianna Heggeseth (she/her)

[bree-AH-na] (Anna like in Frozen) [HEG-eh-seth]

https://bcheggeseth.github.io

Where I’ve been







Course Overview

  • Expanding your abilities for self-reflection in service of:
    • Your lifelong independent learning
    • Our course community
  • Expanding your data science toolbox:
    • Data Visualization
    • Data Wrangling
    • Data Acquisition
    • Data Storytelling

I’ve intentionally put reflection first and data science skills second not necessarily in order of importance but because cultivating data science skills will come automatically—reflection and community-building won’t.

Navigate to the syllabus section of our course site.





Syllabus shaping: learning goals

Part 1: Reflect (~3 min)

Write a few sentences responding to the following questions:

  • What are your goals in taking this class?
  • Do you see your goals reflected in the course learning goals? If not, how would you like to see the course goals amended to see your goals reflected in them?

Part 2: Share (~5 min)

At your tables, take turns sharing your responses to the above questions. As a group, summarize your discussion in this Google Doc.

Before we meet again, I will look over your comments in the Google Doc and add my own responses. I’ll address your comments in the next class.



Introduce each other

Now, take a moment to introduce the person to your left to the rest of the class. Share their name (take time to ensure you pronounced it correctly) and one thing you learned about them so far this class.



Reflection Practice (~5 min)

Free write on a piece of paper.

  • Consider the last time you were so excited to do something. Or consider the last time you were working on something and you lost track of time because you were so invested in it. What were you doing?

  • What do you think will be barriers for you in this class? What are your weaknesses/fears that you think might hold you back?

Anonymously share a summary of your answers on https://www.PollEv.com/briannahegge814.



Comp/Stat 112 Review

Let’s start with some basics: What is a data frame in R?

How could we find out?

What sources would be best to use in this class?

There different levels of understanding:

  • high-level necessary to implement common tasks
  • low-level necessary to tackle new problems and come up with new solutions

We are going to strive for low-level foundational understanding in this class. You should start with a high-level intuitive understanding (Gen AI might be able to provide this) and then dig deeper to get the details (documentation, more advanced textbooks).

Our definition for What is a data frame in R?

A data frame in R is a named list with elements of all the same length.


Data Wrangling Verbs

  • mutate(): creates/changes columns/elements in a data frame/tibble
  • select(): keeps subset of columns/elements in a data frame/tibble
  • filter(): keeps subsets of rows in a data frame/tibble
  • arrange(): sorts rows in a data frame/tibble
  • group_by(): internally groups rows in data frame/tibble by values in 1 or more columsn/elements
  • summarize(): collapses/combines information across rows using functions such as n(), sum(), mean(), min(), max(), median(), sd()
  • count(): shortcut for group_by() %>% summarize(n = n())
  • left_join(): mutating join of two data frames/tibbles keeping all rows in left data frame
  • full_join(): mutating join of two data frames/tibbles keeping all rows in both data frames
  • inner_join(): mutating join of two data frames/tibbles keeping rows in left data frame that find match in right
  • semi_join(): filtering join of two data frames/tibbles keeping rows in left data frame that find match in right
  • anti_join(): filtering join of two data frames/tibbles keeping rows in left data frame that do not find match in right
  • pivot_wider(): rearrange values from two columns to many(one column becomes the names of new variables, one column becomes the values of the new variables)
  • pivot_longer(): rearrange values from many columns to two (the names of the columns go to one new variable, the values of the columns go to a second new variable)



Tidy Tuesday!

For the remainder of the class period, we’ll work on the most recent Tidy Tuesday challenge as a way to review our data wrangling and visualization skills from 112.

You’ll continue working on this outside of class and turn in a visual for Homework 1.

Feel free to clarify anything about the course with me during this time!



After Class

Before the next class, please do the following:

  • Set up the software and systems we need following these instructions.
  • Update your Slack profile with preferred name, pronouns, name pronunciation. (To find your profile, click on your name under Direct Messages on the left menu, and click “Edit Profile”.)
  • Complete the pre-course survey.
  • Take a look at the Schedule page to see how to prepare for the next class.
  • Take a look at Homework 1. You can start working on part of it now (we’ll talk about Github next class).