7.5 Point Referenced Data (optional)
A spatial stochastic process is a spatially indexed collection of random variables,
\[\{Y(s): s\in D \subset \mathbb{R}^d \}\]
where \(d\) will typically be 2 or 3.
A realization from a stochastic process is sometimes called a sample path. We typically observe a vector,
\[\mathbf{Y} \equiv Y(s_1),...,Y(s_n)\]
7.5.1 Gaussian Process
A Gaussian process (G.P.) has finite-dimensional distributions that are all multivariate normal, and we define it using a mean and covariance function,
\[\mu(s) = E(Y(s))\] \[C(s_1,s_2) = Cov(Y(s_1),Y(s_2))\]
As with time series and longitudinal data, we must consider the covariance of our data. Most covariance models that we will consider in this course require that our data come from a stationary, isotropic process such that the mean is constant across space, variance is constant across space, and the covariance is only a function of the distance between locations (disregarding the direction) such that
\[ E(Y(s)) = E(Y(s+h)) = \mu\] \[Cov(Y(s),Y(s+h)) = Cov(Y(0),Y(h)) = C(||h||)\]
7.5.2 Covariance Models
A covariance model that is often used in spatial data is called the Matern class of covariance functions,
\[C(t) = \frac{\sigma^2}{2^{\nu-1}\Gamma(\nu)}(t\rho)^\nu \mathcal{K}_\nu(d/\rho)\]
where \(\mathcal{K}_\nu\) is a modified Bessel function of order \(\nu\). \(\nu\) is a smoothness parameter and if we let \(\nu = 1/2\), we get the exponential covariance.
7.5.3 Variograms, Semivariograms
Beyond the standard definition of stationarity, there is another form of stationary (intrinsic stationarity), which is when
\[Var(Y(s+h) - Y(s))\text{ depends only on h}\]
When this is true, we call
\[\gamma(h) = \frac{1}{2}Var(Y(s+h) - Y(s))\]
the semivariogram and \(2\gamma(h)\) the variogram.
If a covariance function is stationary, it will be intrinsic stationary and
\[\gamma(h) = C(0) - C(h) = C(0) (1-\rho(h))\] where \(C(0)\) is the variance (referred to as the sill).
First, to estimate the semivariogram, fit a trend so that you have a stationary process in your residuals.
- Let \(H_1,...,H_k\) partition the space of possible lags, with \(h_u\) being a representative spatial lag/distance in \(H_u\). Then use your residuals to estimate the empirical semivariogram.
\[\hat{\gamma}(h_u) = \frac{1}{2\cdot |\{s_i-s_j \in H_u\}|}\sum_{\{s_i-s_j\in H_u\}}(e(s_i) - e(s_j))^2\]
- This is a non-parametric estimate of the semivariogram.
library(gstat)
coordinates(meuse) = ~x+y
estimatedVar <- variogram(log(zinc) ~ 1, data = meuse)
plot(estimatedVar)
- If we notice a particular pattern in the points, we could try and fit a curve based on correlation models.
Var.fit1 <- fit.variogram(estimatedVar, model = vgm(1, "Sph", 900, 1))
plot(estimatedVar,Var.fit1)
Var.fit2 <- fit.variogram(estimatedVar, model = vgm(1, "Exp", 900, 1))
plot(estimatedVar,Var.fit2)
Var.fit3 <- fit.variogram(estimatedVar, model = vgm(1, "Gau", 900, 1))
plot(estimatedVar,Var.fit3)
Var.fit4 <- fit.variogram(estimatedVar, model = vgm(1, "Mat", 900, 1))
plot(estimatedVar,Var.fit4)
g = gstat::gstat(formula = log(zinc) ~ 1, model = Var.fit4, data = meuse)
7.5.4 Kriging
Section is under construction.
data(meuse)
data(meuse.grid)
coordinates(meuse) = ~x+y
coordinates(meuse.grid) = ~x+y
zinc.idw <- idw(zinc~1, meuse, meuse.grid)
spplot(zinc.idw["var1.pred"], main = "zinc inverse distance weighted interpolations")
zinc.ord <- predict(g, meuse.grid)
spplot(zinc.ord["var1.pred"], main = "zinc ordinary kriging interpolations")
g = gstat::gstat(formula = log(zinc) ~ sqrt(dist), model = Var.fit4, data = meuse)
zinc.uni <- predict(g, meuse.grid)
spplot(zinc.uni["var1.pred"], main = "zinc universal kriging interpolations")