Checkpoint 7
There is a template RMarkdown file called checkpoint-7.Rmd to start from in the initial Github Repo from Github Classroom.
For this checkpoint, I want you to work together with a partner.
Data Context and Research Question
For the spatial mini project, you will work with data about Ramsey County, the larger geo-political district where Macalester College is located. In particular, I’ve gathered aggregate summaries of the people who live in Ramsey county, summarized for each census tract based on data from 2015-2019 American Community Service run by the U.S. Census Bureau. A census tract is a statistical subdivision of a county that aims to have roughly 4,000 inhabitants and they are intended to be fairly homogeneous with respect to demographic and economic conditions. I used the tidycensus package to gather this data and I’ve provided example code in the Github Repo for how I collected the data.
Make sure to look at a Google Map of Ramsey County, so you can familiarize yourself with the geography, https://goo.gl/maps/m9pXVtjisoZHTmFL7.
In particular, I want you to focus on home values for this project.
Visualizations
- Create two visualizations that address your research question and motivate your model. Write a brief 3-5 sentence summary of what you learn.
ANSWER:
Neighborhood Structure
- Create a neighborhood structure of the census tract boundaries. Justify your choice of neighborhood structure. Write a few sentences about those pros and cons.
ANSWER:
OLS Model
- Fit a linear model with
lm()
to address the research question above and map the residuals by coloring the spatial polygons by the residuals. Comment on what you learn.
ANSWER:
Spatial Correlation
- Let’s calculate and visualize the spatial correlation of the residuals using Moran’s I. Comment on the strength of spatial correlation based on Moran’s I and whether it is discernibly different from 0 (independence).
ANSWER:
Spatial Models
- Fit a Simultaneous Autoregressive (SAR) Model. Comment on what you learn.
ANSWER:
library(spatialreg) #install.packages('spatialreg')
# Convert Neighborhood Information to List (with weighting so that rows sum to
# 1)
Ww <- nb2listw(?? , style = "W")
# Fit SAR Model
mod_sar <- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "SAR")
BIC(mod_sar)
- Map and test the residuals of the SAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.
ANSWER:
ramsey_data$sar_resid <- resid(mod_sar)
ramsey_data %>% ggplot() +
geom_sf(aes(fill = sar_resid)) +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
spdep::moran.test(ramsey_data$sar_resid, Wb, alternative = "two.sided", randomisation = TRUE) # Using randomization test
- Fit a Conditional Autoregressive (CAR) Model. Comment on what you learn.
ANSWER:
# Fit CAR Model
mod_car <- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "CAR")
BIC(mod_car)
- Map and test the residuals of the CAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.
ANSWER:
ramsey_data$car_resid <- resid(mod_car)
ramsey_data %>% ggplot() +
geom_sf(aes(fill = car_resid)) +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
spdep::moran.test(ramsey_data$car_resid, Wb, alternative = "two.sided", randomisation = TRUE) # Using randomization test
- Choose the model that best fits the data in terms of BIC and leaves residuals with the least amount of spatial autocorrelation.
ANSWER: