library(tidyverse)
# Import starbucks location data
<- read.csv("https://mac-stat.github.io/data/starbucks.csv") starbucks
Homework 3: More Data Viz
Spatial Viz
Exercise 1
In the in-class activity, we worked with the Starbucks data. Let’s consider state-level data and take into account for population of each state in the map.
In the code below, we create a new dataset called starbucks_us_by_state
that gives the number of Starbucks in each state.
library(openintro) #install.packages('openintro') in Console
<- starbucks %>%
starbucks_us_by_state filter(Country == "US") %>%
count(State.Province) %>%
mutate(state_name = str_to_lower(abbr2state(State.Province)))
In the code below, a new variable is created, starbucks_per_10000
, that gives the number of Starbucks per 10,000 people. It is in the starbucks_with_2018_pop_est
dataset. This is the dataset we will use for the spatial visualization.
<- read_csv("https://mac-stat.github.io/data/us_census_2018_state_pop_est.csv") %>%
census_pop_est_2018 separate(state, into = c("dot", "state"), extra = "merge") %>%
select(-dot) %>%
mutate(state = str_to_lower(state))
<-
starbucks_with_2018_pop_est %>%
starbucks_us_by_state left_join(census_pop_est_2018,
by = c("state_name" = "state")
%>%
) mutate(starbucks_per_10000 = (n / est_pop_2018) * 10000)
Part a
Create a choropleth state map that shows the number of Starbucks per 10,000 people on a map of the US.
- Use a new fill color palette for the states,
- add points for all Starbucks in the contiguous US,
- add an informative title for the plot, and
- include a caption that says who created the plot (you!).
Part b
Make a conclusion about what you observe from that spatial visual.
Exercise 2
In this exercise, you are going to create a single leaflet map of some of your favorite places! The end result will be one map.
Part a
Create a data set using the tibble() function that has 10-15 rows of your favorite places. The columns will be the name of the location, the latitude, the longitude, and a column that indicates if it is in your top 3 favorite locations or not. For an example of how to use tibble(), look at the favorite_stp
that is created manually below.
# Brianna's favorite St. Paul places - Used Google Maps to get coordinates
# https://support.google.com/maps/answer/18539?hl=en&co=GENIE.Platform%3DDesktop
<- tibble(
favorite_stp place = c(
"Macalester College", "Groveland Recreation Center",
"Due Focacceria", "Shadow Falls Park", "Mattocks Park",
"Carondelet Fields", "Pizza Luce", "Cold Front Ice Cream"
),long = c(
-93.1712321, -93.1851310,
-93.1775469, -93.1944518, -93.171057,
-93.1582673, -93.1524256, -93.156652
),lat = c(
44.9378965, 44.9351034, 44.9274973,
44.9433359, 44.9284142, 44.9251236,
44.9468848, 44.9266768
),favorite = c("yes", "yes", "yes", "no", "no", "no", "no", "no")
)
Part b
Create a map that uses circles to indicate your favorite places.
- Label them with the name of the place.
- Choose the base map you like best.
- Color your 3 favorite places differently than the ones that are not in your top 3.
- Add a legend that explains what the colors mean
TidyTuesday
Tidy Tuesday is a weekly data project put on by some folks from the R Data Science community. Each week, a different data set is posted and people (around the world!) wrangle and visualize that data. According to the organizers, “The intent of Tidy Tuesday is to provide a safe and supportive forum for individuals to practice their wrangling and data visualization skills independent of drawing conclusions.” You will work with TidyTuesday data below, the goals being to:
- Practice generating questions. You have to decide what to ask and how to answer it with a graphic.
- Practice identifying what viz are useful for addressing your questions, and creating effective viz. I encourage you to be creative while also maintaining the integrity of the graph.
- Get a sense of the broader data science community. Check out what people share out on X / Twitter using the #TidyTuesday hashtag. Maybe even share your own #TidyTuesday work on social media. Recent Mac alum Erin Franke (@efranke7282) has an inspiring account! Scrolling through, you’ll notice the trajectory of her work, starting from COMP/STAT 112 to today. Very cool.
Exercise 3
Go to TidyTuesday. Pick a dataset that was posted in July, August, or September 2024. Here, include:
A short (~2 sentence) written description of your data. This should include: the original data source (where did TidyTuesday get the data from?), units of observation (what are you analyzing?), and the data size (how many data points do you have? how many variables are measured on each data point?).
Code to import and examine the basic properties of your chosen dataset. This code must support the facts you cited in your short written description.
Exercise 4
Directions:
In the 3 sections below (Viz 1, 2, 3), construct 3 separate graphs that tell a connected story about this data.
Before each viz, write:
- A simple but specific research question you’re trying to address with the viz.
- A 2-4 sentence summary of what you learn from the viz. This should connect back to your research question!
After each viz, write:
- Comment on at least 2 effective aspects of the viz (consider the effective visualization principles).
- Comment on at least 2 aspects of the visualization that could be improved. Perhaps these are aspects that you don’t know how to implement yet but wish you could update it.
Make sure each viz:
- has meaningful axis labels and legend titles
- has a figure caption (fig.cap)
- uses alt text (fig.alt)
- uses a more color-blind friendly color palette
Tips:
- Start with some questions in mind of what you want to learn.
- Start with a simple viz (viz 1), and build this up into something multivariate (viz 3).
- Reflect on each viz – what new questions do you have after checking out the viz? Let these questions guide your next viz. (eg: recall how we worked through the
MacNaturalGas
data at the start of the Spatial Viz activity).
Viz 1
Viz 2
Viz 3
Wrapping up
Remember to submit your rendered .html file to Moodle. This must include all code you used to create the plots as well as the plots themselves. Make sure the option of embed-resources: true
is at the top of the file to make it a readable html file.