MSCS Happenings
For each problem I marked with an X,
Talk with others in the class; help each other understand the WHY.
Turn into me by next class.
UPDATE: You should have been notified of a shared pdf with feedback
Talk through some of the stumbling blocks with your classmates. Take notes for yourself.
By the end of THIS week, submit an updated version of the Midterm Part 2 to Moodle and write a reflection about the midterm in your reflection Google Doc for March.
My Deal: You may talk to others in the class (not preceptors, not people who have previously taken it) but you may not directly share code with each other. Instead, talk about the actions more conceptually and point each other to resources.
Exploratory Data Analysis (EDA), a name given to the process of
Another way to describe EDA:
See paper handout & online course website for more details.
Open 13-EDA on the course website for exercises.
I want you to work in pairs (3 if needed). List your partner on your Rmd file.
Let’s practice these steps using data about flight delays from Kaggle.
airlines <- read_csv("https://bcheggeseth.github.io/112_spring_2023/data/airlines.csv")
airports <- read_csv("https://bcheggeseth.github.io/112_spring_2023/data/airports.csv")
flights <- read_csv("https://bcheggeseth.github.io/112_spring_2023/data/flights_jan_jul_sample2.csv")
head(airlines)
# A tibble: 6 × 2
IATA_CODE AIRLINE
<chr> <chr>
1 UA United Air Lines Inc.
2 AA American Airlines Inc.
3 US US Airways Inc.
4 F9 Frontier Airlines Inc.
5 B6 JetBlue Airways
6 OO Skywest Airlines Inc.
# A tibble: 6 × 7
IATA_CODE AIRPORT CITY STATE COUNTRY LATIT…¹ LONGI…²
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ABE Lehigh Valley International Air… Alle… PA USA 40.7 -75.4
2 ABI Abilene Regional Airport Abil… TX USA 32.4 -99.7
3 ABQ Albuquerque International Sunpo… Albu… NM USA 35.0 -107.
4 ABR Aberdeen Regional Airport Aber… SD USA 45.4 -98.4
5 ABY Southwest Georgia Regional Airp… Alba… GA USA 31.5 -84.2
6 ACK Nantucket Memorial Airport Nant… MA USA 41.3 -70.1
# … with abbreviated variable names ¹LATITUDE, ²LONGITUDE
# A tibble: 6 × 31
YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT…¹ TAIL_…² ORIGI…³ DESTI…⁴ SCHED…⁵
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr> <chr> <chr>
1 2015 1 1 4 AS 98 N407AS ANC SEA 0005
2 2015 1 1 4 AA 2336 N3KUAA LAX PBI 0010
3 2015 1 1 4 US 840 N171US SFO CLT 0020
4 2015 1 1 4 AA 258 N3HYAA LAX MIA 0020
5 2015 1 1 4 AS 135 N527AS SEA ANC 0025
6 2015 1 1 4 DL 806 N3730B SFO MSP 0025
# … with 21 more variables: DEPARTURE_TIME <chr>, DEPARTURE_DELAY <dbl>,
# TAXI_OUT <dbl>, WHEELS_OFF <chr>, SCHEDULED_TIME <dbl>, ELAPSED_TIME <dbl>,
# AIR_TIME <dbl>, DISTANCE <dbl>, WHEELS_ON <chr>, TAXI_IN <dbl>,
# SCHEDULED_ARRIVAL <chr>, ARRIVAL_TIME <chr>, ARRIVAL_DELAY <dbl>,
# DIVERTED <dbl>, CANCELLED <dbl>, CANCELLATION_REASON <chr>,
# AIR_SYSTEM_DELAY <dbl>, SECURITY_DELAY <dbl>, AIRLINE_DELAY <dbl>,
# LATE_AIRCRAFT_DELAY <dbl>, WEATHER_DELAY <dbl>, and abbreviated variable …
Complete the 1 exercise of finding a new dataset, import, create a visual for Assignment 8 (Data Import)
Finish these exercises for Assignment 8 (EDA)
Make sure you come up with a specific research question with your partner during class today.
Midterm Revisions Part 2 due Friday
IV1 due next week