At the heart of every spatial data is a set of locations. One way to describe a location is in terms of coordinates and a coordinate reference system (CRS).
There are three main components to a CRS: ellipsoid, datum, and a projection. (The projection is a part of a projected CRS.)
Ellipsoid
The Earth is not a sphere.
It’s closer to a bumpy ellipsoid with a bulge at the equator.
The ellipsoid part of a CRS is a mathematical model giving a smooth approximation to Earth’s shape.
Common ellipsoid models: WGS84 and GRS80
Illustration of ellipsoid model (black) and Earth’s irregular surface (red), centered to have an overall best fit. Source: www.icsm.gov.au
Datum
Where do we center the ellipsoid? This center is called a datum.
For a given ellipsoid model, different datums are used to better fit different areas of the world.
e.g., For the GRS80 ellipsoid, the NAD83 datum is a good fit in North America, but SIRGAS2000 is a better fit in South America.
The Global Positioning System (GPS) uses the WGS84 ellipsoid model and WGS84 datum. This provides an overall best fit of the Earth.
Illustration of ellipsoid model and Earth’s irregular surface for a datum that better fits southern part (bottom right) of the Earth. Source: www.icsm.gov.au
Why do the ellipsoid and datum matter?
If you have longitude and latitude coordinates for a location, you need to know what datum and ellipsoid were used to define those positions in order to overlay those points correctly on a map.
Note: In practice, the horizontal distance between WGS84 and NAD83 coordinates is about 3-4 feet in the US, which may not be significant for most applications.
Projection
Lastly, the Earth lives in a 3 dimensional (3D) world and most visualizations are on a 2 dimensional (2D) surface. We must choose a projection method to represent points, regions, and lines on Earth on a 2D map with distance units (typically meter, international foot, US survey foot). In that projection process, a 3D element will lose angle, area, and/or distance when projected onto a 2D surface, no matter which method is chosen.
Useful for navigation because it represented north as up and south as down everywhere and preserves local directions and shape
Drawback: it inflates the size of regions far from the equator. Greenland, Antarctica, Canada, and Russia appear much bigger than they should. The illustration below compares country areas/shapes under the Mercator projection (light blue) with true areas/shapes (dark blue).
ggplot(data = world) +geom_sf(color ="black", fill ="#bada55") +coord_sf(crs ="+proj=laea +lat_0=52 +lon_0=10 +x_0=4321000 +y_0=3210000 +ellps=GRS80 +units=m +no_defs") +labs(title ="Lambert Azimuthal Equal-Area Projection", subtitle ="Correctly represents area but not angles") +theme_bw()
ggplot(data = world) +geom_sf(color ="black", fill ="#bada55") +coord_sf(crs ="+proj=fouc") +labs(title ="Foucaut Projection", subtitle ="Correctly represents area, lots of shape distortion in high latitudes") +theme_bw()
ggplot(data = world) +geom_sf(color ="black", fill ="#bada55") +coord_sf(crs ="+proj=natearth2") +labs(title ="Natural Earth II Projection", subtitle ="Represents globe shape, distorted at high latitudes") +theme_bw()
Spatial data
With a CRS to collect and record location data in terms of longitude or easting (x) and latitude or northing (y) coordinates, we can now consider common models for storing spatial data on the computer. There are two main data models: vector and raster.
Vector
Vector data represents the world as a set of spatial geometries that are defined in terms of location coordinates (with a specified CRS) with non-spatial attributes or properties.
The three basic vector geometries are:
Points: Locations defined based on a (x, y) coordinates.
e.g., Cities
Lines: A set of ordered points connected by straight lines.
e.g., Roads, rivers
Polygons: A set of ordered points connected by straight lines, first and last point are the same.
e.g., Geopolitical boundaries, bodies of water
File formats:
Text files (e.g., .csv)
x and y columns: for coordinates (x for longitude and y for latitude)
group id column: needed for lines and polygons
additional columns: attributes related to the areas in each row (e.g., population sizes, demographic information)
Text files do not store the CRS.
Shapefiles (.shp)
Widely supported spatial vector file format (that includes the CRS).
High resolution raster data involves a large number of small cells. This results in large file sizes and objects which can make computation and visualization quite slow.
File formats:
GeoTIFF (.tif or .tiff)
Most popular
NetCDF (.nc)
HDF (.hdf)
To work with raster data in R, you’ll use the raster, terra, and the stars packages. If you are interested in learning more, check out https://r-spatial.github.io/stars/.