Checkpoint 7
There is a template RMarkdown file called README.Rmd to start from in the initial Github Repo from Github Classroom.
For this checkpoint, I want you to work together with a partner.
Github Setup
To create a shared repository (for you, your partner, and me), go to https://classroom.github.com/a/2bksWUcq. Join your spatial team.
Data Context and Research Question
For the spatial mini project, you will work with data about Ramsey County, the larger geo-political district where Macalester College is located. In particular, I’ve gathered aggregate summaries of the people who live in Ramsey county, summarized for each census tract based on data from 2015-2019 American Community Service run by the U.S. Census Bureau. A census tract is a statistical subdivision of a county that aims to have roughly 4,000 inhabitants and they are intended to be fairly homogeneous with respect to demographic and economic conditions. I used the tidycensus package to gather this data and I’ve provided example code in the Github Repo for how I collected the data.
Make sure to look at a Google Map of Ramsey County, so you can familiarize yourself with the geography, https://goo.gl/maps/m9pXVtjisoZHTmFL7.
In particular, I want you to focus on home values for this project.
Refine Research Question
- Update or refined your research question from checkpoint-6. If you’d like to get other characteristics from tidycensus, you may. See code in SpatialCleaning.R file (you’ll need to get your own census API key).
ANSWER:
load('SpatialData.RData')
head(ramsey_data)
# View(CodeBook) to see original descriptions from tidycensus
Visualizations
- Create two visualizations that address your research question and motivate your model. Write a brief 3-5 sentence summary of what you learn.
ANSWER:
Neighborhood Structure
- Create a neighborhood structure of the census tract boundaries. Justify your choice of neighborhood structure. Write a few sentences about those pros and cons.
ANSWER:
OLS Model
- Fit a linear model with
lm()
to address the research question above and map the residuals by coloring the spatial polygons by the residuals. Comment on what you learn.
ANSWER:
<- lm(?? ~ ??, data = ramsey_data)
lm_mod
BIC(lm_mod)
$lm_resid <- resid(lm_mod)
ramsey_data
%>% ggplot() +
ramsey_data geom_sf(aes(fill = lm_resid)) +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
Spatial Correlation
- Let’s calculate and visualize the spatial correlation of the residuals using Moran’s I. Comment on the strength of spatial correlation based on Moran’s I and whether it is discernibly different from 0 (independence).
ANSWER:
<- nb2listw(??, style = "B") #style = 'B' gives binary weights
Wb ::moran.test(ramsey_data$lm_resid, Wb, alternative = "two.sided", randomisation = TRUE) # Using randomization test spdep
Spatial Models
- Fit a Simultaneous Autoregressive (SAR) Model. Comment on what you learn.
ANSWER:
library(spatialreg) #install.packages('spatialreg')
# Convert Neighborhood Information to List (with weighting so that rows sum to
# 1)
<- nb2listw(?? , style = "W")
Ww
# Fit SAR Model
<- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "SAR")
mod_sar
BIC(mod_sar)
- Map and test the residuals of the SAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.
ANSWER:
$sar_resid <- resid(mod_sar)
ramsey_data
%>% ggplot() +
ramsey_data geom_sf(aes(fill = sar_resid)) +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
::moran.test(ramsey_data$sar_resid, Wb, alternative = "two.sided", randomisation = TRUE) # Using randomization test spdep
- Fit a Conditional Autoregressive (CAR) Model. Comment on what you learn.
ANSWER:
# Fit CAR Model
<- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "CAR")
mod_car
BIC(mod_car)
- Map and test the residuals of the CAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.
ANSWER:
$car_resid <- resid(mod_car)
ramsey_data
%>% ggplot() +
ramsey_data geom_sf(aes(fill = car_resid)) +
scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()
::moran.test(ramsey_data$car_resid, Wb, alternative = "two.sided", randomisation = TRUE) # Using randomization test spdep
- Choose the model that best fits the data in terms of BIC and leaves residuals with the least amount of spatial autocorrelation.
ANSWER: