Checkpoint 6

There is a template RMarkdown file called README.Rmd to start from in the initial Github Repo from Github Classroom.

For this checkpoint, I want you to work together with a partner.

Github Setup

To create a shared repository (for you, your partner, and me), go to https://classroom.github.com/a/js0MCMGN. Only one partner creates the team, call it spatial-Name1-Name2. Then the other partner joins that team.

Data Context and Research Question

For the spatial mini project, you will work with data about Ramsey County, the larger geo-political district where Macalester College is located. In particular, I’ve gathered aggregate summaries of the people who live in Ramsey county, summarized for each census tract based on data from 2015-2019 American Community Service run by the U.S. Census Bureau. A census tract is a statistical subdivision of a county that aims to have roughly 4,000 inhabitants and they are intended to be fairly homogeneous with respect to demographic and economic conditions. I used the tidycensus package to gather this data and I’ve provided example code in the Github Repo for how I collected the data.

Make sure to look at a Google Map of Ramsey County, so you can familiarize yourself with the geography, https://goo.gl/maps/m9pXVtjisoZHTmFL7.

In particular, I want you to focus on home values for this project.

Data Context

In order to analyze the home values of today, you need a bit of context of the history of real estate in Ramsey County. Below I have listed a variety of resources. You don’t need to limit yourself to these resources and you don’t need to read all of these resources. Feel free to work on this after class.

History of Real Estate in Ramsey County before 1900

History of Redlining in Ramsey County

  1. Write one paragraph that introduces and summarizes the history of real estate in Ramsey County for the purposes of laying ground work for your future data analysis.

ANSWER:

Develop Research Question

  1. Load in the data set and look at the ramsey_data and the CodeBook. I’ve provided you a small number of variables to consider (any variable that does not include median or average in the description is a count or proportion; the E at the end of variables indicates estimate). Develop a research question based on the available characteristics.

ANSWER:

load('SpatialData.RData')
head(ramsey_data)
# View(CodeBook) to see original descriptions from tidycensus

Visualizations

  1. Create a visualization, first mapping out the census tract polygon geometries with geom_sf() and filling the polygons by the outcome, HouseValueE. Write a brief 1-2 sentence summary of what you learn.

ANSWER:

  1. Create visualizations that explore the relationship between explanatory variables and the outcome of interest, HouseValueE, ignoring the spatial component. Write a brief 3-5 sentence summary of what you learn.

ANSWER:

Neighborhood Structure

  1. Create a neighborhood structure of the census tract boundaries. Start with using the Queen definition of neighbors. Visualize that network on top of the map of spatial polygons. Consider the advantages and disadvantages of defining neighbors in this way. Write a few sentences about those pros and cons.

ANSWER:

Queen <- poly2nb(ramsey_data, queen = TRUE)

ramsey_centroids <- st_centroid(st_geometry(ramsey_data), of_largest_polygon = TRUE)
nb_Q_net <- nb2lines(nb = Queen, coords = ramsey_centroids, as_sf = TRUE)


#Visualize network on the map (unfilled)
  1. Based on your experience living in St. Paul, do you believe there are any barriers, physical or otherwise, that would lead to two “neighboring” census tracts to be more different than you’d expect two census tracts that share a boundary? Rely on your own experience to consider this.

ANSWER:

OLS Model

  1. Fit a linear model with lm() to address the research question above and map the residuals by coloring the spatial polygons by the residuals. Comment on what you learn.

ANSWER:

lm_mod <- lm(?? ~ ??, data = ramsey_data)

ramsey_data$lm_resid <- resid(lm_mod)

ramsey_data %>% ggplot() +
  geom_sf(aes(fill = lm_resid)) +
  scale_fill_gradient2(mid = "white", high = "red", low = "blue") + theme_classic()

Spatial Correlation

  1. Let’s calculate and visualize the spatial correlation of the residuals using Moran’s I. Comment on the strength of spatial correlation based on Moran’s I and whether it is discernibly different from 0 (independence).

ANSWER:

W <- nb2listw(Queen, style = "B") #style = 'B' gives binary weights
#W <- nb2listw(Queen, style = "W") #style = 'W' gives row-normalized weights


mp <- spdep::moran.plot(ramsey_data$lm_resid, W, plot=FALSE)
ggplot(mp, aes(x = x, y = wx)) + 
  geom_point() + 
  geom_smooth(method="lm" , se = FALSE) + 
  geom_hline(yintercept=mean(mp$wx), lty=2) + 
  geom_vline(xintercept=mean(mp$x), lty=2) + theme_classic() + 
  xlab('Residuals') + ylab("Average Residual of Neighbors")


spdep::moran.test(ramsey_data$lm_resid, W, alternative = "two.sided", randomisation = TRUE)  # Using randomization test