14 Intro to Three Types of Spatial Data

Learning Goals

  • Explain and detect three different types of spatial data (point process, areal data, point-reference/geospatial)
  • Formulate research questions the three types of spatial data

Warm Up

  • Discuss with your classmates any time you’ve encountered point-referenced data.

  • Discuss with your classmates any time you’ve encountered point process data.

  • Discuss with your classmates any time you’ve encountered areal data.

  • Discuss: Which type of data would these be? What types of questions might you ask?

    • AirBnB listings in St. Paul, MN
    • Air pollution measurements hospital admissions for cardiovascular disease in 8 cities
    • Loyal customer information for Target (what they buy, where they buy, how much)
    • Forest fires in the U.S. between 2000 and 2020

Group Work

Spatial Graphics

Download a template RMarkdown file to start from here.

Spatial Points

We are going to start by plotting the locations of all accredited colleges and universities in the U.S.

Go to https://nces.ed.gov/ipeds/datacenter/DataFiles.aspx. Click on HD2022 under Data File. It will download a zip file. Unzip this file and you’ll get hd2022.csv. Put that csv in the same location as this Rmd file.

library(readr)
library(dplyr)
library(sf) #install.packages('sf')

colleges <- read_csv('hd2022.csv') #read in data

Now, we’ll convert the data frame to a spatial data frame (using the sf package). We have to tell it the names of the variables that correspond to the longitude (x) and latitude (y) coordinates. Notice the print out. The number of features are the number of rows (colleges) and the number of fields is the number of variables. A particular characteristic of a spatial data frame is that it has a geometry (e.g. point, line, polygon, multipolygon). What is the geometry for this data set?

colleges <- sf::st_as_sf(colleges, coords = c('LONGITUD','LATITUDE'))
colleges
colleges$geometry

We can plot these points by passing the data set to ggplot() and use geom_sf() to plot the geometry list in the appropriate form (point, line, polygon, etc.). Go ahead and plot the locations of the colleges and universities in MN.

colleges %>% 
  filter(STABBR == 'MN') %>%
  ??

Spatial Polygons

If we’d like the state boundaries, we’ll need to get the those values as a polygon. Thankfully, the maps package has all of that information for us. We’ll convert it to a spatial data frame using the sf package. Notice the geometry type and also note the CRS (Coordinate Reference System).

#install.packages('maps') if needed
states <- st_as_sf(maps::map("state", plot = FALSE, fill = TRUE))

head(states)

In order to plot the points of the colleges on the same plot as the polygons of state boundaries, we need to make sure we are using the same CRS. You might have noticed that the colleges data set didn’t have a CRS listed, so let’s set the CRS of colleges to be the same as the states.

st_crs(colleges) <- st_crs(states) # provide CRS, if it doesn't have an existing CRS

Add to your existing plot of college locations by adding + geom_sf(data = states %>% filter(ID == 'minnesota'), alpha=0). Notice how it knows what type of plot to make based on the geometry type.

Spatial Lines

Now, let’s add more to our map. Let’s add the main state and interstate highways. Go to https://catalog.data.gov/dataset/tiger-line-shapefile-2019-state-minnesota-primary-and-secondary-roads-state-based-shapefile. Click to Download the Shapefile Zip File. Unzip the folder and move this whole folder (named ‘tl_2019_27_prisecroads’) to the same location as the Rmd you are working. Notice the geometry type.

roads <- read_sf('tl_2019_27_prisecroads')
roads <- st_transform(roads, crs = st_crs(states)) # change CRS if it has an existing CRS
roads

Now that we have this spatial data set loaded, let’s add to our previous plot by adding + geom_sf(data = roads %>% filter(RTTYP %in% c('U','I')), color = 'green'). We are first filtering the roads data set to only include U.S. State roads and Interstate highways and coloring them green.

We’ve worked with spatial points, lines, and polygons. So far, we’ve focused only on locations. These plots do not encode any outcome data.

Mapping Data

  1. Think about how might you be able to incorporate the CONTROL of the college (1: Public, 2: Non-Profit Private, 3: For-Profit Private). Try creating a graph with that outcome data visualized on the map.

  2. Let’s consider counties in MN. What if we wanted to aggregate the college information (# of colleges) to a county level?

#install.packages('maps') if needed
counties <- st_as_sf(maps::map("county", plot = FALSE, fill = TRUE))

head(counties)

countiesMN <- counties %>% filter(stringr::str_detect(ID,'minnesota'))

We want to check to see how many colleges are in each county. You can use st_intersects(X,Y) to check to see if the geometry of X intersects with the geometry of Y. In other words, we want to see for each county how many MN college point locations are in the county polygon.

sf::sf_use_s2(FALSE) # avoids an error with spherical geometry.
countiesMN$NumColleges <- st_intersects(countiesMN, colleges %>% filter(STABBR == 'MN')) %>% lengths()

##Create a plot of counties and fill according to the number of colleges

countiesMN %>%
  ???