7.3 Working with Spatial Data in R

The technology available in R is rapidly evolving and improving. In this set of notes, I’ve highlighted some basics for working with spatial data in R, but I list some good resources below.

7.3.1 R Packages

The following R packages support spatial data classes (data sets that are indexed with geometries):

  • sf: generic support for working with spatial data
  • geojsonsf: read in geojson files

The following R packages contain cultural and physical boundaries, and raster maps:

  • maps: polygon maps of the world
  • USAboundaries: contemporary US state, county, and Congressional district boundaries, as well as zip code tabulation area centroids
  • rnaturalearth: hold and facilitate interaction with Natural Earth map data

The following R packages support geostatistical/point-referenced data analysis:

  • gstat: classical geostatistics
  • geoR: model-based geostatistics
  • RandomFields: stochastic processes
  • akima: interpolation

The following R packages support regional/areal data analysis:

  • spdep: spatial dependence
  • spgwr: geographically weighted regression

The following R packages support point patterns/processes data analysis:

  • spatstat: parametric modeling, diagnostics
  • splancs: non-parametric, space-time

7.3.2 Read in data to R

For each file format, we need use a different function to read in the data. See the examples below for reading in GeoJSON, csv, and shapefiles.

# Read in GeoJSON file
hex_spatial <- geojsonsf::geojson_sf("data/us_states_hexgrid.geojson") 

# Read in CSV File
pop_growth <- readr::read_csv('data/apportionment.csv') %>% janitor::clean_names()

# Read in Shapefiles
mn_cities <- sf::read_sf('data/shp_loc_pop_centers') #shp file/folder
mn_water <- sf::read_sf('data/shp_water_lakes_rivers') #shp file/folder

7.3.3 Data classes in R

When data is read it, an R data object is created of a default class. Notice the classes of the R objects we read in. Also, notice that an object may have multiple classes, which indicate the type of structure it has and how functions may interact with the object.

class(hex_spatial)
class(pop_growth)
class(mn_cities)
class(mn_water)

You also may encounter classes such as SpatialPoints, SpatialLines, and SpatialPolygons, or Spatial*DataFrame from the sp package. The community is moving away from using older sp classes to sf classes. It is useful for you to know that the older versions exist but stick with the sf classes.

  • sfc objects are modern, general versions of the spatial geometries from the sp package with a bbox, CRS, and many geometries available.
  • sf objects are data.frame-like objects with a geometry column of class sfc

7.3.4 Convert data class types

We can convert objects between these data classes with the following functions:

  • fortify(x): sp object x to data.frame

  • st_as_sf(x ): sp object x to sf

  • st_as_sf(x, coords = c("long", "lat")): data.frame x to sf as points

  • To convert a data.frame with columns of long, lat, and group containing polygon geometry information, you can use:

st_as_sf(x, coords = c("long", "lat")) %>%
group_by(group) %>%
  summarise(geometry = st_combine(geometry)) %>%
  st_cast("POLYGON")

7.3.5 Static Visualizations

In general, if you have geometries (points, polygons, lines, etc.) that you want to plot, you can use geom_sf() with ggplot(). See https://ggplot2.tidyverse.org/reference/ggsf.html for more details.

7.3.5.1 Plotting points

If you have x and y coordinates (longitude and latitude over a small region), we can use our typical plotting tools in ggplot2 package using the x and y coordinates as the x and y values in geom_point(). Then you can color the points according to a covariate or outcome variable.

If you have longitude and latitude over the globe or a larger region, you must project those coordinates onto a 2D surface. You can do this using the sf package and st_transform() after specifying the CRS (documentation: https://www.rdocumentation.org/packages/sf/versions/0.8-0/topics/st_transform). Then we can use geom_sf().

We’ll walk through create a map of MN with different layers of information (city point locations, county polygon boundaries, rivers as lines and polygons, and a raster elevation map). To add all of this information on one map, we need to ensure that the CRS is the same for all spatial datasets.

#check CRS
st_crs(mn_cities)

#check CRS
st_crs(mn_water)

#transform CRS of water to the same of the cities
mn_water <- mn_water %>%
  st_transform(crs = st_crs(mn_cities))
# Load country boundaries data as sf object
mn_counties <- USAboundaries::us_counties(resolution = "high", states = "Minnesota")

# Remove duplicate column names
names_counties <- names(mn_counties)
names(mn_counties)[names_counties == 'state_name'] <- c("state_name1", "state_name2")

# Check CRS
st_crs(mn_counties)

# Transform the CRS of county data to the more local CRS of the cities
mn_counties <- mn_counties %>%
  st_transform(crs = st_crs(mn_cities))

st_crs(mn_counties)
ggplot() + # plot frame
  geom_sf(data = mn_cities, size = 0.5) + # city point layer
  ggthemes::theme_map()
ggplot() + # plot frame
  geom_sf(data = mn_counties, fill = NA) + # county boundary layer
  geom_sf(data = mn_cities, size = 0.5) + # city point layer
  ggthemes::theme_map()
ggplot() +
  geom_sf(data = mn_counties, fill = 'wheat', color = "tan") + 
  geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population), alpha = 0.8)+ #cities layer
  scale_color_viridis_c() + #continuous (gradient) color scale
  labs(title = "Minnesota Cities with Population >= 10,000") + 
  ggthemes::theme_map() + theme(legend.position = "bottom")  #move legend

7.3.5.2 Plotting polygons

If you have areal data, you’ll need shapefiles with boundaries for those polygons. City, state, or federal governments often provide these. We can read them into R with st_read() in the sf package. Once we have that stored object, we can view shapefile metadata using the st_geometry_type(), st_crs(), and st_bbox(). These tell you about the type of geometry stored about the shapes, the CRS, and the bounding box that determines the study area of interest.

#read in shapefile information
shapefile <- st_read(shapefile) 

If we have data that are points that we will aggregate to a polygon level, then we could use code such as the one below to join together the summaries at an average longitude and latitude coordinate with the shapefiles by whether the longitude and latitude intersect with the polygon.

#join the shapefile and our data summaries with a common polygon I.D. variable
fulldat <- left_join(shapefile, dat) 
hex_growth <- hex_spatial %>% 
  mutate(name = str_replace(google_name,' \\(United States\\)',''),
         abbr = iso3166_2) %>%
  left_join(pop_growth, by = 'name')

If we have data that are points that we will aggregate to a polygon level, then we could use code such as the one below to join together the summaries at an average longitude and latitude coordinate with the shapefiles by whether the longitude and latitude intersect with the polygon.

# make our data frame a spatial data frame
dat <- st_as_sf(originaldatsum, coords = c('Longitude','Latitude'))
#copy the coordinate reference info from the shapefile onto our new data frame
st_crs(dat) <- st_crs(shapefile) 

#join the shapefile and our data summaries
fulldat <- st_join(shapefile, dat, join = st_intersects) 

If you are working with U.S. counties, states, or global countries, the shapefiles are already available in the map package. You’ll need to use the I.D. variable to join this data with your polygon-level data.

Once we have an sf object with attributes (variables) and geometry (polygons), we can use geom_sf(aes(fill = attribute)) to plot and color the polygons in the ggplot2 context with respect to an outcome variable.

hex_growth%>%  # start with sf object
  filter(year == 2020) %>% #filter to focus on data from 2020
  ggplot() +
  geom_sf(aes(fill = percent_change_in_resident_population)) + # plot the sf geometry (polygons) and fill color according to percent change in population
  geom_sf_text( aes(label = abbr), color = 'white') + # add text labels to the sf geometry regions using abbr for the text
  labs(fill = 'Population Change (%)') + # Change legend label
  ggthemes::theme_map() + theme(legend.position = 'bottom', legend.justification = 'right') # remove the background theme and move the legend to the bottom right 
#devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr) # projection with Alaska and Hawaii close to lower 48 states
get_urbn_map(map = "counties", sf = TRUE) %>%
    left_join(countydata) %>%
    ggplot() + geom_sf(aes(fill = horate), color = 'white',linewidth = 0.01) + 
    labs(fill = 'Homeownership Rate') + 
    scale_fill_gradient(high='white',low = 'darkblue',limits=c(0.25,0.85)) + 
    theme_void()


# Subset to M.N.
get_urbn_map(map = "counties", sf = TRUE) %>%
    left_join(countydata) %>%
    filter(stringr::str_detect(state_abbv,'MN')) %>%
    ggplot() + geom_sf(aes(fill = horate), color = 'white',linewidth = 0.05) + 
    labs(fill = 'Homeownership Rate') + 
    coord_sf(crs=26915) + 
  scale_fill_gradient(high='white',low = 'darkblue',limits=c(0.25,0.85)) + 
  theme_void()

7.3.5.3 Plotting Raster

To include raster images in the visualization, we need to obtain/load raster data. Below shows code to get the elevation raster image for MN.

Then we need to convert the raster to a data.frame to plot using geom_raster as an additional layer to ggplot().

elevation <- elevatr::get_elev_raster(mn_counties, z = 5, clip = 'bbox')
raster::crs(elevation) <- sf::st_crs(mn_counties)

# Convert to Data Frame for plotting
elev_df <- elevation %>% terra::as.data.frame(xy = TRUE)
names(elev_df) <-c('x','y','value')

ggplot() +
  geom_raster(data = elev_df, aes(x = x,y = y,fill = value)) + # adding the elevation as first (bottom) layer
  geom_sf(data = mn_counties, fill = NA, color = "black") + 
  geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population), alpha = 0.8)+ #cities layer
  scale_color_viridis_c() + #continuous (gradient) color scale
  scale_fill_gradient(low = 'darkgreen',high = 'white', guide = FALSE) + 
  labs(title = "Minnesota Cities with Population >= 10,000") + 
  ggthemes::theme_map() + theme(legend.position = "bottom")  #move legend

To demonstrate multiple layers on one visualization, let’s zoom into the Twin Cities and add waterways and rivers.

Seven_countyarea <- st_bbox(mn_counties %>% filter(name %in% c("Anoka", "Hennepin", "Ramsey", "Dakota", "Carver", "Washington", "Scott")))


elevation <- elevatr::get_elev_raster(mn_counties %>% st_crop(Seven_countyarea), z = 9, clip = 'bbox')
raster::crs(elevation) <- sf::st_crs(mn_counties)

#Convert to Data Frame for plotting
elev_df <- elevation %>% terra::as.data.frame(xy = TRUE)
names(elev_df) <-c('x','y','value')


ggplot() +
  geom_raster(data = elev_df, aes(x = x,y = y,fill = value)) + 
  geom_sf(data = mn_counties, fill = NA, color = "black") + # county boundary layer
  geom_sf(data = mn_water, fill = 'lightsteelblue1',color = 'lightsteelblue1') + # added a river/lake layer
  geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population)) + #cities layer
  coord_sf(xlim = Seven_countyarea[c(1,3)],ylim = Seven_countyarea[c(2,4)]) + # crop map to coordinates of seven county area
  scale_color_viridis_c(option = 'magma') + #continuous (gradient) color scale
  scale_fill_gradient(low = 'darkgreen',high = 'white') + #continuous (gradient) fill scale
  labs(title = "Twin Cities with Population >= 10,000") + 
  ggthemes::theme_map() + theme(legend.position = "none")  #remove legend