Checkpoint 7

There is a template RMarkdown file called README.Rmd to start from in the initial Github Repo from Github Classroom.

For this checkpoint, I want you to work together with a partner.

Github Setup

To create a shared repository (for you, your partner, and me), go to https://classroom.github.com/a/2bksWUcq. Join your spatial team.

Data Context and Research Question

For the spatial mini project, you will work with data about Ramsey County, the larger geo-political district where Macalester College is located. In particular, I’ve gathered aggregate summaries of the people who live in Ramsey county, summarized for each census tract based on data from 2015-2019 American Community Service run by the U.S. Census Bureau. A census tract is a statistical subdivision of a county that aims to have roughly 4,000 inhabitants and they are intended to be fairly homogeneous with respect to demographic and economic conditions. I used the tidycensus package to gather this data and I’ve provided example code in the Github Repo for how I collected the data.

Make sure to look at a Google Map of Ramsey County, so you can familiarize yourself with the geography, https://goo.gl/maps/m9pXVtjisoZHTmFL7.

In particular, I want you to focus on home values for this project.

Refine Research Question

Update or refined your research question from checkpoint-6. If you’d like to get other characteristics from tidycensus, you may. See code in SpatialCleaning.R file (you’ll need to get your own census API key).

ANSWER:

load('SpatialData.RData')
head(ramsey_data)
# View(CodeBook) to see original descriptions from tidycensus

Visualizations

Create two visualizations that address your research question and motivate your model. Write a brief 3-5 sentence summary of what you learn.

ANSWER:

Neighborhood Structure

Create a neighborhood structure of the census tract boundaries. Justify your choice of neighborhood structure. Write a few sentences about those pros and cons.

ANSWER:

OLS Model

Fit a linear model with lm() to address the research question above and map the residuals by coloring the spatial polygons by the residuals. Comment on what you learn.

ANSWER:

lm_mod <- lm(?? ~ ??, data = ramsey_data)

BIC(lm_mod)

ramsey_data$lm_resid <- resid(lm_mod)

ramsey_data %>% ggplot() +
  geom_sf(aes(fill = lm_resid)) +
  scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()

Spatial Correlation

Let’s calculate and visualize the spatial correlation of the residuals using Moran’s I. Comment on the strength of spatial correlation based on Moran’s I and whether it is discernibly different from 0 (independence).

ANSWER:

Wb <- nb2listw(??, style = "B") #style = 'B' gives binary weights
spdep::moran.test(ramsey_data$lm_resid, Wb, alternative = "two.sided", randomisation = TRUE)  # Using randomization test

Spatial Models

Fit a Simultaneous Autoregressive (SAR) Model. Comment on what you learn.

ANSWER:

library(spatialreg) #install.packages('spatialreg')
# Convert Neighborhood Information to List (with weighting so that rows sum to
# 1)
Ww <- nb2listw(?? , style = "W")

# Fit SAR Model
mod_sar <- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "SAR")

BIC(mod_sar)

Map and test the residuals of the SAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.

ANSWER:

ramsey_data$sar_resid <- resid(mod_sar)

ramsey_data %>% ggplot() +
  geom_sf(aes(fill = sar_resid)) +
  scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()

spdep::moran.test(ramsey_data$sar_resid, Wb, alternative = "two.sided", randomisation = TRUE)  # Using randomization test

Fit a Conditional Autoregressive (CAR) Model. Comment on what you learn.

ANSWER:

# Fit CAR Model
mod_car <- spautolm(formula = ?? ~ ???, data = ramsey_data, listw = Ww, family = "CAR")

BIC(mod_car)

Map and test the residuals of the CAR model to see if the resulting residuals are independent or spatially correlated. Comment on what you learn.

ANSWER:

ramsey_data$car_resid <- resid(mod_car)

ramsey_data %>% ggplot() +
  geom_sf(aes(fill = car_resid)) +
  scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()

spdep::moran.test(ramsey_data$car_resid, Wb, alternative = "two.sided", randomisation = TRUE)  # Using randomization test

Choose the model that best fits the data in terms of BIC and leaves residuals with the least amount of spatial autocorrelation.

ANSWER: