1 Introductions

Settling In

Welcome to Intermediate Data Science!

Sit with a group of 3-4 people that you do not know well.

Introduce yourself

Name, pronounciation tips, pronouns
Macalester connections (e.g., majors/minors/concentrations, clubs, teams, events regularly attended)
How are you feeling about the coming semester?
One thing from break that you are proud of!
If you could use data to investigate anything, what would it be and why?

Everything on the slides is in the online manual, which you’ll want to have open in class: https://bcheggeseth.github.io/212_spring_2025/

Plan for today

Introductions
What is this course about?
Get to know your classmates
Shaping our syllabus together
Reflection practice
Brief 112 review
Warming up our wrangling and visualization skills with Tidy Tuesday!

Introductions

Instructor

Who I am

Prof. Brianna Heggeseth (she/her)

[bree-AH-na] (Anna like in Frozen) [HEG-eh-seth]

https://bcheggeseth.github.io

Where I’ve been

Course Overview

Expanding your abilities for self-reflection in service of:
- Your lifelong independent learning
- Our course community
Expanding your data science toolbox:
- Data Visualization
- Data Wrangling
- Data Acquisition
- Data Storytelling

I’ve intentionally put reflection first and data science skills second not necessarily in order of importance but because cultivating data science skills will come automatically—reflection and community-building won’t.

Navigate to the syllabus section of our course site.

Syllabus shaping: learning goals

Part 1: Reflect (~3 min)

Write a few sentences responding to the following questions:

What are your goals in taking this class?
Do you see your goals reflected in the course learning goals? If not, how would you like to see the course goals amended to see your goals reflected in them?

Part 2: Share (~5 min)

At your tables, take turns sharing your responses to the above questions. As a group, summarize your discussion in this Google Doc.

Before we meet again, I will look over your comments in the Google Doc and add my own responses. I’ll address your comments in the next class.

Introduce each other

Now, take a moment to introduce the person to your left to the rest of the class. Share their name (take time to ensure you pronounced it correctly) and one thing you learned about them so far this class.

Reflection Practice (~5 min)

Free write on a piece of paper.

Consider the last time you were so excited to do something. Or consider the last time you were working on something and you lost track of time because you were so invested in it. What were you doing?
What do you think will be barriers for you in this class? What are your weaknesses/fears that you think might hold you back?

Anonymously share a summary of your answers on https://www.PollEv.com/briannahegge814.

Comp/Stat 112 Review

Let’s start with some basics: What is a data frame in R?

How could we find out?

What sources would be best to use in this class?

There different levels of understanding:

high-level necessary to implement common tasks
low-level necessary to tackle new problems and come up with new solutions

We are going to strive for low-level foundational understanding in this class. You should start with a high-level intuitive understanding (Gen AI might be able to provide this) and then dig deeper to get the details (documentation, more advanced textbooks).

Our definition for What is a data frame in R?

A data frame in R is a named list with elements of all the same length.

Data Wrangling Verbs

mutate(): creates/changes columns/elements in a data frame/tibble
select(): keeps subset of columns/elements in a data frame/tibble
filter(): keeps subsets of rows in a data frame/tibble
arrange(): sorts rows in a data frame/tibble
group_by(): internally groups rows in data frame/tibble by values in 1 or more columsn/elements
summarize(): collapses/combines information across rows using functions such as n(), sum(), mean(), min(), max(), median(), sd()
count(): shortcut for group_by() %>% summarize(n = n())
left_join(): mutating join of two data frames/tibbles keeping all rows in left data frame
full_join(): mutating join of two data frames/tibbles keeping all rows in both data frames
inner_join(): mutating join of two data frames/tibbles keeping rows in left data frame that find match in right
semi_join(): filtering join of two data frames/tibbles keeping rows in left data frame that find match in right
anti_join(): filtering join of two data frames/tibbles keeping rows in left data frame that do not find match in right
pivot_wider(): rearrange values from two columns to many(one column becomes the names of new variables, one column becomes the values of the new variables)
pivot_longer(): rearrange values from many columns to two (the names of the columns go to one new variable, the values of the columns go to a second new variable)

Tidy Tuesday!

For the remainder of the class period, we’ll work on the most recent Tidy Tuesday challenge as a way to review our data wrangling and visualization skills from 112.

You’ll continue working on this outside of class and turn in a visual for Homework 1.

Feel free to clarify anything about the course with me during this time!

After Class

Before the next class, please do the following:

Set up the software and systems we need following these instructions.
Update your Slack profile with preferred name, pronouns, name pronunciation. (To find your profile, click on your name under Direct Messages on the left menu, and click “Edit Profile”.)
Complete the pre-course survey.
Take a look at the Schedule page to see how to prepare for the next class.
Take a look at Homework 1. You can start working on part of it now (we’ll talk about Github next class).