Checkpoint 6
There is a template RMarkdown file called checkpoint-6.Rmd to start from in the initial Github Repo from Github Classroom.
For this checkpoint, I want you to work together with a partner.
Github Setup
To create a shared repository (for you, your partner, and me), go to https://classroom.github.com/a/Utgod8nv. Only one partner creates the team, call it spatial-Name1-Name2
. Then the other partner joins that team.
Data Context and Research Question
For the spatial mini-project, you will work with data about Ramsey County, the larger geo-political district where Macalester College is located. In particular, I’ve gathered aggregate summaries of the people who live in Ramsey county, summarized for each census tract based on data from 2015-2019 American Community Service run by the U.S. Census Bureau. A census tract is a statistical subdivision of a county that aims to have roughly 4,000 inhabitants and they are intended to be fairly homogeneous with respect to demographic and economic conditions. I used the tidycensus package to gather this data and I’ve provided example code in the Github Repo for how I collected the data.
Make sure to look at a Google Map of Ramsey County, so you can familiarize yourself with the geography, https://goo.gl/maps/m9pXVtjisoZHTmFL7.
In particular, I want you to focus on home values for this mini-project.
Data Context
In order to analyze the home values of today, you need a bit of context of the history of real estate in Ramsey County. Below I have listed a variety of resources. You don’t need to limit yourself to these resources and you don’t need to read all of these resources. Feel free to work on this after class.
History of Real Estate in Ramsey County before 1900
History of Redlining in Ramsey County
- https://mappingprejudice.umn.edu/
- https://welcomingthedearneighbor.org/maps-data/
- https://interfaithaction.org/wp-content/uploads/2019/10/Redlining.pdf
- https://www.youtube.com/watch?v=I9P7VAKiekU
- Write one paragraph that introduces and summarizes the history of real estate in Ramsey County for the purposes of laying ground work for your future data analysis. Complete this after class.
ANSWER:
Develop Research Question
- Load in the data set and look at the
ramsey_data
and theCodeBook
. I’ve provided you a small number of variables to consider (any variable that does not include median or average in the description is a count or proportion; the E at the end of variables indicates estimate). Develop a research question based on the available characteristics.
ANSWER:
Visualizations
- Create a visualization, first mapping out the census tract polygon geometries with
geom_sf()
and filling the polygons by the outcome,HouseValueE
. Write a brief 1-2 sentence summary of what you learn.
ANSWER:
- Create visualizations that explore the relationship between explanatory variables and the outcome of interest,
HouseValueE
, ignoring the spatial component. Write a brief 3-5 sentence summary of what you learn.
ANSWER:
Neighborhood Structure
- Create a neighborhood structure of the census tract boundaries. Start with using the Queen definition of neighbors. Visualize that network on top of the map of spatial polygons. Consider the advantages and disadvantages of defining neighbors in this way. Write a few sentences about those pros and cons.
ANSWER:
Queen <- poly2nb(ramsey_data, queen = TRUE)
ramsey_centroids <- st_centroid(st_geometry(ramsey_data), of_largest_polygon = TRUE)
nb_Q_net <- nb2lines(nb = Queen, coords = ramsey_centroids, as_sf = TRUE)
#Visualize network on the map (unfilled)
- Based on your experience living in St. Paul, do you believe there are any barriers, physical or otherwise, that would lead to two “neighboring” census tracts to be more different than you’d expect two census tracts that share a boundary? Rely on your own experience to consider this.
ANSWER:
OLS Model
- Fit a linear model with
lm()
to address the research question above and map the residuals by coloring the spatial polygons by the residuals. Comment on what you learn.
ANSWER:
Spatial Correlation
- Let’s calculate and visualize the spatial correlation of the residuals using Global and Local Moran’s I. Comment on the strength of spatial correlation based on the Global Moran’s I and whether it is discernibly different from 0 (independence).
ANSWER:
W <- nb2listw(Queen, style = "W") #style = 'W' gives row-normalized weights
mp <- spdep::moran.plot(ramsey_data$lm_resid, W, plot=FALSE)
ggplot(mp, aes(x = x, y = wx)) +
geom_point() +
geom_smooth(method="lm" , se = FALSE) +
geom_hline(yintercept = mean(mp$wx), lty=2) +
geom_vline(xintercept = mean(mp$x), lty=2) + theme_classic() +
labs(x = 'Residuals',y = "Average Residual of Neighbors", title = "Spatial Autocorrelation of Residuals")
local_moran <- spdep::localmoran(ramsey_data$lm_resid, W)
ramsey_data %>% bind_cols(local_moran) %>%
ggplot() +
geom_sf(aes(fill = Ii)) +
labs(fill = 'Local Moran I') +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
ramsey_data %>% bind_cols(local_moran) %>% bind_cols(attr(local_moran,'quadr')) %>%
mutate(mean = if_else(`Pr(z != E(Ii))` < 0.005, mean,NA)) %>%
ggplot() +
geom_sf(aes(fill = mean)) +
labs(fill = 'Local Moran I Hotspots') +
scale_fill_manual(values=c('red'),na.value = 'lightgrey' ) + theme_classic()
W <- nb2listw(Queen, style = "B") #style = 'B' gives binary weights
spdep::moran.test(ramsey_data$lm_resid, W, alternative = "two.sided", randomisation = TRUE) # Global Moran's I, using randomization test