7.3 Working with Spatial Data in R
The technology available in R is rapidly evolving and improving. In this set of notes, I’ve highlighted some basics for working with spatial data in R, but I list some good resources below.
- https://cran.r-project.org/web/views/Spatial.html (last updated in 2023)
- https://r-spatial.org/book/ (last updated in 2023)
- https://r-spatial.github.io/sf/index.html (last updated in 2023)
- https://r.geocompx.org/ (last updated in 2023)
- http://www.nickeubank.com/gis-in-r/ (last updated in 2023)
- https://cengel.github.io/R-spatial/ (last updated in 2019)
7.3.1 R Packages
The following R packages support spatial data classes (data sets that are indexed with geometries):
sf
: generic support for working with spatial datageojsonsf
: read in geojson files
The following R packages contain cultural and physical boundaries, and raster maps:
maps
: polygon maps of the worldUSAboundaries
: contemporary US state, county, and Congressional district boundaries, as well as zip code tabulation area centroidsrnaturalearth
: hold and facilitate interaction with Natural Earth map data
The following R packages support geostatistical/point-referenced data analysis:
gstat
: classical geostatisticsgeoR
: model-based geostatisticsRandomFields
: stochastic processesakima
: interpolation
The following R packages support regional/areal data analysis:
spdep
: spatial dependencespgwr
: geographically weighted regression
The following R packages support point patterns/processes data analysis:
spatstat
: parametric modeling, diagnosticssplancs
: non-parametric, space-time
7.3.2 Read in data to R
For each file format, we need use a different function to read in the data. See the examples below for reading in GeoJSON, csv, and shapefiles.
# Read in GeoJSON file
hex_spatial <- geojsonsf::geojson_sf("data/us_states_hexgrid.geojson")
# Read in CSV File
pop_growth <- readr::read_csv('data/apportionment.csv') %>% janitor::clean_names()
# Read in Shapefiles
mn_cities <- sf::read_sf('data/shp_loc_pop_centers') #shp file/folder
mn_water <- sf::read_sf('data/shp_water_lakes_rivers') #shp file/folder
7.3.3 Data classes in R
When data is read it, an R data object is created of a default class. Notice the classes of the R objects we read in. Also, notice that an object may have multiple classes, which indicate the type of structure it has and how functions may interact with the object.
You also may encounter classes such as SpatialPoints
, SpatialLines
, and SpatialPolygons
, or Spatial*DataFrame
from the sp
package. The community is moving away from using older sp
classes to sf
classes. It is useful for you to know that the older versions exist but stick with the sf
classes.
sfc
objects are modern, general versions of the spatial geometries from thesp
package with a bbox, CRS, and many geometries available.sf
objects are data.frame-like objects with a geometry column of classsfc
7.3.4 Convert data class types
We can convert objects between these data classes with the following functions:
fortify(x)
:sp
object x todata.frame
st_as_sf(x )
:sp
object x tosf
st_as_sf(x, coords = c("long", "lat"))
:data.frame
x tosf
as pointsTo convert a
data.frame
with columns oflong
,lat
, andgroup
containing polygon geometry information, you can use:
st_as_sf(x, coords = c("long", "lat")) %>%
group_by(group) %>%
summarise(geometry = st_combine(geometry)) %>%
st_cast("POLYGON")
7.3.5 Static Visualizations
In general, if you have geometries (points, polygons, lines, etc.) that you want to plot, you can use geom_sf()
with ggplot()
. See https://ggplot2.tidyverse.org/reference/ggsf.html for more details.
7.3.5.1 Plotting points
If you have x and y coordinates (longitude and latitude over a small region), we can use our typical plotting tools in ggplot2
package using the x and y coordinates as the x and y values in geom_point()
. Then you can color the points according to a covariate or outcome variable.
If you have longitude and latitude over the globe or a larger region, you must project those coordinates onto a 2D surface. You can do this using the sf package and st_transform()
after specifying the CRS (documentation: https://www.rdocumentation.org/packages/sf/versions/0.8-0/topics/st_transform). Then we can use geom_sf()
.
We’ll walk through create a map of MN with different layers of information (city point locations, county polygon boundaries, rivers as lines and polygons, and a raster elevation map). To add all of this information on one map, we need to ensure that the CRS is the same for all spatial datasets.
#check CRS
st_crs(mn_cities)
#check CRS
st_crs(mn_water)
#transform CRS of water to the same of the cities
mn_water <- mn_water %>%
st_transform(crs = st_crs(mn_cities))
# Load country boundaries data as sf object
mn_counties <- USAboundaries::us_counties(resolution = "high", states = "Minnesota")
# Remove duplicate column names
names_counties <- names(mn_counties)
names(mn_counties)[names_counties == 'state_name'] <- c("state_name1", "state_name2")
# Check CRS
st_crs(mn_counties)
# Transform the CRS of county data to the more local CRS of the cities
mn_counties <- mn_counties %>%
st_transform(crs = st_crs(mn_cities))
st_crs(mn_counties)
ggplot() + # plot frame
geom_sf(data = mn_cities, size = 0.5) + # city point layer
ggthemes::theme_map()
ggplot() + # plot frame
geom_sf(data = mn_counties, fill = NA) + # county boundary layer
geom_sf(data = mn_cities, size = 0.5) + # city point layer
ggthemes::theme_map()
ggplot() +
geom_sf(data = mn_counties, fill = 'wheat', color = "tan") +
geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population), alpha = 0.8)+ #cities layer
scale_color_viridis_c() + #continuous (gradient) color scale
labs(title = "Minnesota Cities with Population >= 10,000") +
ggthemes::theme_map() + theme(legend.position = "bottom") #move legend
7.3.5.2 Plotting polygons
If you have areal data, you’ll need shapefiles with boundaries for those polygons. City, state, or federal governments often provide these. We can read them into R with st_read()
in the sf package. Once we have that stored object, we can view shapefile metadata using the st_geometry_type()
, st_crs()
, and st_bbox()
. These tell you about the type of geometry stored about the shapes, the CRS, and the bounding box that determines the study area of interest.
If we have data that are points that we will aggregate to a polygon level, then we could use code such as the one below to join together the summaries at an average longitude and latitude coordinate with the shapefiles by whether the longitude and latitude intersect with the polygon.
#join the shapefile and our data summaries with a common polygon I.D. variable
fulldat <- left_join(shapefile, dat)
hex_growth <- hex_spatial %>%
mutate(name = str_replace(google_name,' \\(United States\\)',''),
abbr = iso3166_2) %>%
left_join(pop_growth, by = 'name')
If we have data that are points that we will aggregate to a polygon level, then we could use code such as the one below to join together the summaries at an average longitude and latitude coordinate with the shapefiles by whether the longitude and latitude intersect with the polygon.
# make our data frame a spatial data frame
dat <- st_as_sf(originaldatsum, coords = c('Longitude','Latitude'))
#copy the coordinate reference info from the shapefile onto our new data frame
st_crs(dat) <- st_crs(shapefile)
#join the shapefile and our data summaries
fulldat <- st_join(shapefile, dat, join = st_intersects)
If you are working with U.S. counties, states, or global countries, the shapefiles are already available in the map package. You’ll need to use the I.D. variable to join this data with your polygon-level data.
Once we have an sf object with attributes (variables) and geometry (polygons), we can use geom_sf(aes(fill = attribute))
to plot and color the polygons in the ggplot2 context with respect to an outcome variable.
hex_growth%>% # start with sf object
filter(year == 2020) %>% #filter to focus on data from 2020
ggplot() +
geom_sf(aes(fill = percent_change_in_resident_population)) + # plot the sf geometry (polygons) and fill color according to percent change in population
geom_sf_text( aes(label = abbr), color = 'white') + # add text labels to the sf geometry regions using abbr for the text
labs(fill = 'Population Change (%)') + # Change legend label
ggthemes::theme_map() + theme(legend.position = 'bottom', legend.justification = 'right') # remove the background theme and move the legend to the bottom right
#devtools::install_github("UrbanInstitute/urbnmapr")
library(urbnmapr) # projection with Alaska and Hawaii close to lower 48 states
get_urbn_map(map = "counties", sf = TRUE) %>%
left_join(countydata) %>%
ggplot() + geom_sf(aes(fill = horate), color = 'white',linewidth = 0.01) +
labs(fill = 'Homeownership Rate') +
scale_fill_gradient(high='white',low = 'darkblue',limits=c(0.25,0.85)) +
theme_void()
# Subset to M.N.
get_urbn_map(map = "counties", sf = TRUE) %>%
left_join(countydata) %>%
filter(stringr::str_detect(state_abbv,'MN')) %>%
ggplot() + geom_sf(aes(fill = horate), color = 'white',linewidth = 0.05) +
labs(fill = 'Homeownership Rate') +
coord_sf(crs=26915) +
scale_fill_gradient(high='white',low = 'darkblue',limits=c(0.25,0.85)) +
theme_void()
7.3.5.3 Plotting Raster
To include raster images in the visualization, we need to obtain/load raster data. Below shows code to get the elevation raster image for MN.
Then we need to convert the raster to a data.frame to plot using geom_raster
as an additional layer to ggplot()
.
elevation <- elevatr::get_elev_raster(mn_counties, z = 5, clip = 'bbox')
raster::crs(elevation) <- sf::st_crs(mn_counties)
# Convert to Data Frame for plotting
elev_df <- elevation %>% terra::as.data.frame(xy = TRUE)
names(elev_df) <-c('x','y','value')
ggplot() +
geom_raster(data = elev_df, aes(x = x,y = y,fill = value)) + # adding the elevation as first (bottom) layer
geom_sf(data = mn_counties, fill = NA, color = "black") +
geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population), alpha = 0.8)+ #cities layer
scale_color_viridis_c() + #continuous (gradient) color scale
scale_fill_gradient(low = 'darkgreen',high = 'white', guide = FALSE) +
labs(title = "Minnesota Cities with Population >= 10,000") +
ggthemes::theme_map() + theme(legend.position = "bottom") #move legend
To demonstrate multiple layers on one visualization, let’s zoom into the Twin Cities and add waterways and rivers.
Seven_countyarea <- st_bbox(mn_counties %>% filter(name %in% c("Anoka", "Hennepin", "Ramsey", "Dakota", "Carver", "Washington", "Scott")))
elevation <- elevatr::get_elev_raster(mn_counties %>% st_crop(Seven_countyarea), z = 9, clip = 'bbox')
raster::crs(elevation) <- sf::st_crs(mn_counties)
#Convert to Data Frame for plotting
elev_df <- elevation %>% terra::as.data.frame(xy = TRUE)
names(elev_df) <-c('x','y','value')
ggplot() +
geom_raster(data = elev_df, aes(x = x,y = y,fill = value)) +
geom_sf(data = mn_counties, fill = NA, color = "black") + # county boundary layer
geom_sf(data = mn_water, fill = 'lightsteelblue1',color = 'lightsteelblue1') + # added a river/lake layer
geom_sf(data = mn_cities %>% filter(Population >= 10000), mapping = aes(color = Population,size = Population)) + #cities layer
coord_sf(xlim = Seven_countyarea[c(1,3)],ylim = Seven_countyarea[c(2,4)]) + # crop map to coordinates of seven county area
scale_color_viridis_c(option = 'magma') + #continuous (gradient) color scale
scale_fill_gradient(low = 'darkgreen',high = 'white') + #continuous (gradient) fill scale
labs(title = "Twin Cities with Population >= 10,000") +
ggthemes::theme_map() + theme(legend.position = "none") #remove legend
7.3.6 More R Resources
- sf package: https://r-spatial.github.io/sf/
- sf Cheat sheet: https://github.com/rstudio/cheatsheets/blob/main/sf.pdf