Introduction to Data Science

Brianna Heggeseth

Introductions

As we gather, sit with people you don’t know well and introduce yourself (you choose what you share) to others at your table.


Here are some suggestions:

  • Your preferred name (+ pronunciation tips)
  • Aspects of who you are and have been (e.g. pronouns, geographical identity, cultural identity, hobbies/passions)
  • Aspects of who you’d like to be (e.g. personal/professional/academic goals)
  • How you are feeling about new semester (!?!)

Be prepared to introduce one other person to the larger class

Big Data

Main components of data as volume, velocity, variety, and veracity

Four Components of Big Data

Data Science in Liberal Arts

The liberal arts setting provides an opportunity to synthesize lenses for data developed in the social and hard sciences, humanities, and fine arts

  • Data Science applies these lenses to extract knowledge from data within a particular domain of inquiry and contexts such as
    • educational policy making,
    • ecological modeling,
    • journalism,
    • computational linguistics, etc.

Data Science Skills

Data Science skills mapped to Job Titles

Data Science Jobs

  • Government agencies (e.g., NSA, CIA)
  • Science institutions (e.g., NASA, NIH)
  • Companies/divisions specializing in data analysis (e.g., IBM)
  • Retail companies that have huge amounts of data and analyze it to drive business decisions (e.g., Amazon, Netflix, Target, Etsy)
  • Other sectors: journalism, healthcare, biotech/genomics, NGOs, finance, insurance, gaming and hospitality, energy/utilities, manufacturing, pharmaceuticals

Data Science Projects

Data Journalism

Public Policy

More Examples

Data Science Projects

Who am I

Prof. Brianna Heggeseth

[bree-AH-na] [HEG-eh-seth]

bcheggeseth.github.io/

Who am I

Where I’ve Been

Introductions

Now, take a turn introducing another person in the class.

Course Details

Syllabus

  • Learning Goals
  • Community of Learners
  • Course Components
  • Communication
  • Environment You Deserve

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

  • from research question formulation,
  • to data collection/scraping,
  • to wrangling,
  • to modeling,
  • to visualization,
  • to presentation and communication

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

One visual representation of the pipeline with legos. The first step is data collection with a pile of legos. The second step is a data preparation with the legos separated by color. The third step is data visualization with legos separated and organized by color and type so you can more easily understand what’s happening. The fourth step is data analysis in which legos are put together into some creation such as roof, lawn, etc. Lastly, the final step is data storytelling in which someone is playing with a lego house.

Learning Goals

Overall Learning Goal

Gain confidence in carrying out the entire data science pipeline,

Cute fuzzy monsters putting rectangular data tables onto a conveyor belt. Along the conveyor belt line are different automated “stations” that update the data, reading “WRANGLE”, “VISUALIZE”, and “MODEL”. A monster at the end of the conveyor belt is carrying away a table that reads “Complete analysis.”

Art by Allison Horst

Learning Goals

By the end of the course, you’ll be able to:

  • Appreciate the role of data science in a wide range of disciplines
  • Identify, collect, and wrangle data from multiple sources
  • Visualize a variety of types of data
  • Find code online and adapt it to your given task
  • Using iterative refinement and teamwork, take a data science project from concept to reality
  • Communicate your results so that they’re reproducible and accessible for a broad audience

Community of Learners

Target Audience.

  • No matter your statistics and coding background (no experience to expert) or major and interests, this course is for YOU!

Learn by doing.

  • Learning by doing entails getting stuck, making mistakes, asking questions, and getting feedback.

Community of Learners

Collaboration.

  • Working effectively in a group setting is an essential life skill that requires practice and demonstrably improves your learning

Community building.

  • People learn best in community when they feel safe, seen, and cared about.

Course Components

Activities & Assignments

  • In class activities (Notes + exercises) –> finish after class and turn in as assignments
  • Opportunity to practice skills and dig deeper

Tidy Tuesday & Iterative Viz

  • Regular visualization practice on new, real data
  • Opportunities to iterate based on feedback
  • Opportunity to engage with wider data science community

Course Components

Midterm Assessment

  • In-class assessment of basic visualization and wrangling skills
  • Important checkpoint before advanced tools + projects

Final Project & Presentation

  • Group data science project
  • Opportunity to showcase skills and learn new things on a real data set

Communication

Slack Channels: class-wide messaging platform for content-related questions

  • General channel: class-wide announcements
  • Content-specific class-wide channels: data-viz, troubleshooting, etc.
  • Section-specific channels: section-1, section-2, fyc-community
  • Study-group channel (optional): class-wide channel to seek classmates to work together outside of class

Email or DM in Slack: for anything personal in nature (e.g. illness, feeling overwhelmed, feedback, etc.)

Environment You Deserve

Macalester College values diversity and inclusion.

  • I am committed to a climate of mutual respect, free of discrimination based on race, ethnicity, gender identity, religion, sexual orientation, disability, and other identities, in and out of the classroom. This class strives to be a learning environment that is usable, equitable, inclusive, and welcoming.

  • To help support these goals, I expect you to follow the MSCS Community Guidelines.

  • These guidelines were created by the MSCS faculty and staff in our ongoing efforts to create a community that is more welcoming, supportive, and inclusive.

Environment You Deserve

  • Respect: regard the feelings, wishes, experiences, and traditions of others as individuals

  • Empathy: try to sense and understand others’ emotions and feelings

  • Start with Curiosity: don’t assume; instead, ask a question

  • Supportive Community: you are not learning in isolation but rather, in a community ready to help and assist each other

Let’s Get Started

Go to bcheggeseth.github.io/112_fall_2022/course-schedule.html

After Class

Continue working through the activity:

  • Try each line of code (copy and paste) in the console and then check solutions on activity online
  • Complete Practice Section at the end and turn that in on Moodle (Assignment 1) by next Tuesday evening