7.5 Point Referenced Data (optional)

A spatial stochastic process is a spatially indexed collection of random variables,

\[\{Y(s): s\in D \subset \mathbb{R}^d \}\]

where \(d\) will typically be 2 or 3.

A realization from a stochastic process is sometimes called a sample path. We typically observe a vector,

\[\mathbf{Y} \equiv Y(s_1),...,Y(s_n)\]

7.5.1 Gaussian Process

A Gaussian process (G.P.) has finite-dimensional distributions that are all multivariate normal, and we define it using a mean and covariance function,

\[\mu(s) = E(Y(s))\] \[C(s_1,s_2) = Cov(Y(s_1),Y(s_2))\]

As with time series and longitudinal data, we must consider the covariance of our data. Most covariance models that we will consider in this course require that our data come from a stationary, isotropic process such that the mean is constant across space, variance is constant across space, and the covariance is only a function of the distance between locations (disregarding the direction) such that

\[ E(Y(s)) = E(Y(s+h)) = \mu\] \[Cov(Y(s),Y(s+h)) = Cov(Y(0),Y(h)) = C(||h||)\]

7.5.2 Covariance Models

A covariance model that is often used in spatial data is called the Matern class of covariance functions,

\[C(t) = \frac{\sigma^2}{2^{\nu-1}\Gamma(\nu)}(t\rho)^\nu \mathcal{K}_\nu(d/\rho)\]

where \(\mathcal{K}_\nu\) is a modified Bessel function of order \(\nu\). \(\nu\) is a smoothness parameter and if we let \(\nu = 1/2\), we get the exponential covariance.

7.5.3 Variograms, Semivariograms

Beyond the standard definition of stationarity, there is another form of stationary (intrinsic stationarity), which is when

\[Var(Y(s+h) - Y(s))\text{ depends only on h}\]

When this is true, we call

\[\gamma(h) = \frac{1}{2}Var(Y(s+h) - Y(s))\]

the semivariogram and \(2\gamma(h)\) the variogram.

If a covariance function is stationary, it will be intrinsic stationary and

\[\gamma(h) = C(0) - C(h) = C(0) (1-\rho(h))\] where \(C(0)\) is the variance (referred to as the sill).

First, to estimate the semivariogram, fit a trend so that you have a stationary process in your residuals.

Let \(H_1,...,H_k\) partition the space of possible lags, with \(h_u\) being a representative spatial lag/distance in \(H_u\). Then use your residuals to estimate the empirical semivariogram.

\[\hat{\gamma}(h_u) = \frac{1}{2\cdot |\{s_i-s_j \in H_u\}|}\sum_{\{s_i-s_j\in H_u\}}(e(s_i) - e(s_j))^2\]

This is a non-parametric estimate of the semivariogram.

library(gstat)
coordinates(meuse) = ~x+y

estimatedVar <- variogram(log(zinc) ~ 1, data = meuse)
plot(estimatedVar)

If we notice a particular pattern in the points, we could try and fit a curve based on correlation models.

Var.fit1 <- fit.variogram(estimatedVar, model = vgm(1, "Sph", 900, 1))
plot(estimatedVar,Var.fit1)

Var.fit2 <- fit.variogram(estimatedVar, model = vgm(1, "Exp", 900, 1))
plot(estimatedVar,Var.fit2)

Var.fit3 <- fit.variogram(estimatedVar, model = vgm(1, "Gau", 900, 1))
plot(estimatedVar,Var.fit3)

Var.fit4 <- fit.variogram(estimatedVar, model = vgm(1, "Mat", 900, 1))
plot(estimatedVar,Var.fit4)


g = gstat::gstat(formula = log(zinc) ~ 1, model = Var.fit4, data = meuse)

7.5.4 Kriging

Section is under construction.

data(meuse)
data(meuse.grid)
coordinates(meuse) = ~x+y
coordinates(meuse.grid) = ~x+y


zinc.idw <- idw(zinc~1, meuse, meuse.grid) 
spplot(zinc.idw["var1.pred"], main = "zinc inverse distance weighted interpolations")

zinc.ord <- predict(g, meuse.grid)
spplot(zinc.ord["var1.pred"], main = "zinc ordinary kriging interpolations")

g = gstat::gstat(formula = log(zinc) ~ sqrt(dist), model = Var.fit4, data = meuse)
zinc.uni <- predict(g, meuse.grid)
spplot(zinc.uni["var1.pred"], main = "zinc universal kriging interpolations")