1 Introductions
Settling In
Welcome to Intermediate Data Science!
Plan for today
Introductions
Instructor
Who I am
Prof. Brianna Heggeseth (she/her)
[bree-AH-na] (Anna like in Frozen) [HEG-eh-seth]
Where I’ve been
Course Overview
- Expanding your abilities for self-reflection in service of:
- Your lifelong independent learning
- Our course community
- Expanding your data science toolbox:
- Data Visualization
- Data Wrangling
- Data Acquisition
- Data Storytelling
I’ve intentionally put reflection first and data science skills second not necessarily in order of importance but because cultivating data science skills will come automatically—reflection and community-building won’t.
Navigate to the syllabus section of our course site.
Syllabus shaping: learning goals
Part 1: Reflect (~3 min)
Write a few sentences responding to the following questions:
- What are your goals in taking this class?
- Do you see your goals reflected in the course learning goals? If not, how would you like to see the course goals amended to see your goals reflected in them?
Part 2: Share (~5 min)
At your tables, take turns sharing your responses to the above questions. As a group, summarize your discussion in this Google Doc.
Before we meet again, I will look over your comments in the Google Doc and add my own responses. I’ll address your comments in the next class.
Introduce each other
Now, take a moment to introduce the person to your left to the rest of the class. Share their name (take time to ensure you pronounced it correctly) and one thing you learned about them so far this class.
Reflection Practice (~5 min)
Free write on a piece of paper.
Consider the last time you were so excited to do something. Or consider the last time you were working on something and you lost track of time because you were so invested in it. What were you doing?
What do you think will be barriers for you in this class? What are your weaknesses/fears that you think might hold you back?
Anonymously share a summary of your answers on https://www.PollEv.com/briannahegge814.
Comp/Stat 112 Review
Let’s start with some basics: What is a data frame in R?
How could we find out?
- Google Search
- Gen AI
- Intro Textbook
- R Manuals
- Advanced Textbook
- R Documentation for data.frame()
- R Documentation for tibble class
What sources would be best to use in this class?
There different levels of understanding:
- high-level necessary to implement common tasks
- low-level necessary to tackle new problems and come up with new solutions
We are going to strive for low-level foundational understanding in this class. You should start with a high-level intuitive understanding (Gen AI might be able to provide this) and then dig deeper to get the details (documentation, more advanced textbooks).
Our definition for What is a data frame in R?
A data frame in R is a named list with elements of all the same length.
Data Wrangling Verbs
mutate()
: creates/changes columns/elements in a data frame/tibbleselect()
: keeps subset of columns/elements in a data frame/tibblefilter()
: keeps subsets of rows in a data frame/tibblearrange()
: sorts rows in a data frame/tibblegroup_by()
: internally groups rows in data frame/tibble by values in 1 or more columsn/elementssummarize()
: collapses/combines information across rows using functions such asn()
,sum()
,mean()
,min()
,max()
,median()
,sd()
count()
: shortcut forgroup_by() %>% summarize(n = n())
left_join()
: mutating join of two data frames/tibbles keeping all rows in left data framefull_join()
: mutating join of two data frames/tibbles keeping all rows in both data framesinner_join()
: mutating join of two data frames/tibbles keeping rows in left data frame that find match in rightsemi_join()
: filtering join of two data frames/tibbles keeping rows in left data frame that find match in rightanti_join()
: filtering join of two data frames/tibbles keeping rows in left data frame that do not find match in rightpivot_wider()
: rearrange values from two columns to many(one column becomes the names of new variables, one column becomes the values of the new variables)pivot_longer()
: rearrange values from many columns to two (the names of the columns go to one new variable, the values of the columns go to a second new variable)
Tidy Tuesday!
For the remainder of the class period, we’ll work on the most recent Tidy Tuesday challenge as a way to review our data wrangling and visualization skills from 112.
You’ll continue working on this outside of class and turn in a visual for Homework 1.
Feel free to clarify anything about the course with me during this time!
After Class
Before the next class, please do the following:
- Set up the software and systems we need following these instructions.
- Update your Slack profile with preferred name, pronouns, name pronunciation. (To find your profile, click on your name under Direct Messages on the left menu, and click “Edit Profile”.)
- Complete the pre-course survey.
- Take a look at the Schedule page to see how to prepare for the next class.
- Take a look at Homework 1. You can start working on part of it now (we’ll talk about Github next class).