MSCS Happenings
For each problem I marked with an X,
Talk with others in the class; help each other understand the WHY.
Turn into me today or Friday.
Exploratory Data Analysis (EDA), a name given to the process of
Another way to describe EDA:
See paper handout & online course website for more details.
Open 13-EDA on the course website for exercises.
I want you to work in pairs (3 if needed). List your partner on your Rmd file.
Let’s practice these steps using data about flight delays from Kaggle.
airlines <- read_csv("https://bcheggeseth.github.io/112_fall_2023/data/airlines.csv")
airports <- read_csv("https://bcheggeseth.github.io/112_fall_2023/data/airports.csv")
flights <- read_csv("https://bcheggeseth.github.io/112_fall_2023/data/flights_jan_jul_sample2.csv")
head(airlines)
# A tibble: 6 × 2
IATA_CODE AIRLINE
<chr> <chr>
1 UA United Air Lines Inc.
2 AA American Airlines Inc.
3 US US Airways Inc.
4 F9 Frontier Airlines Inc.
5 B6 JetBlue Airways
6 OO Skywest Airlines Inc.
# A tibble: 6 × 7
IATA_CODE AIRPORT CITY STATE COUNTRY LATITUDE LONGITUDE
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 ABE Lehigh Valley International … Alle… PA USA 40.7 -75.4
2 ABI Abilene Regional Airport Abil… TX USA 32.4 -99.7
3 ABQ Albuquerque International Su… Albu… NM USA 35.0 -107.
4 ABR Aberdeen Regional Airport Aber… SD USA 45.4 -98.4
5 ABY Southwest Georgia Regional A… Alba… GA USA 31.5 -84.2
6 ACK Nantucket Memorial Airport Nant… MA USA 41.3 -70.1
# A tibble: 6 × 31
YEAR MONTH DAY DAY_OF_WEEK AIRLINE FLIGHT_NUMBER TAIL_NUMBER ORIGIN_AIRPORT
<dbl> <dbl> <dbl> <dbl> <chr> <dbl> <chr> <chr>
1 2015 1 1 4 AS 98 N407AS ANC
2 2015 1 1 4 AA 2336 N3KUAA LAX
3 2015 1 1 4 US 840 N171US SFO
4 2015 1 1 4 AA 258 N3HYAA LAX
5 2015 1 1 4 AS 135 N527AS SEA
6 2015 1 1 4 DL 806 N3730B SFO
# ℹ 23 more variables: DESTINATION_AIRPORT <chr>, SCHEDULED_DEPARTURE <chr>,
# DEPARTURE_TIME <chr>, DEPARTURE_DELAY <dbl>, TAXI_OUT <dbl>,
# WHEELS_OFF <chr>, SCHEDULED_TIME <dbl>, ELAPSED_TIME <dbl>, AIR_TIME <dbl>,
# DISTANCE <dbl>, WHEELS_ON <chr>, TAXI_IN <dbl>, SCHEDULED_ARRIVAL <chr>,
# ARRIVAL_TIME <chr>, ARRIVAL_DELAY <dbl>, DIVERTED <dbl>, CANCELLED <dbl>,
# CANCELLATION_REASON <chr>, AIR_SYSTEM_DELAY <dbl>, SECURITY_DELAY <dbl>,
# AIRLINE_DELAY <dbl>, LATE_AIRCRAFT_DELAY <dbl>, WEATHER_DELAY <dbl>
Complete the 1 exercise of finding a new dataset, import, create a visual for Assignment 8 (Data Import)
Finish these exercises for Assignment 8 (EDA)
Make sure you come up with a specific research question with your partner during class today.
Midterm Revisions due Friday
IV1 due next week