7.4 Point Processes (optional)

A point pattern/process data set gives the locations of objects/events occurring in a study region. These points could represent trees, animal nests, earthquake epicenters, crimes, cases of influenza, galaxies, etc. We assume that a point’s occurrence or non-occurrence at a location is random.

The observed points may have extra information called marks attached to them. These marks represent an attribute or characteristic of the point. These could be categorical or continuous.

7.4.1 Poisson Point Processes

The underlying probability model we assume determines the number of events in a small area will be a Poisson model with parameter \(\lambda(s)\), the intensity in a fixed area. This intensity may be constant (uniform) across locations or vary from location to location (inhomogeneous).

For a homogeneous Poisson process, we model the number of points in any region \(A\), \(N(A)\), to be Poisson with

\[E(N(A)) = \lambda\cdot\text{Area}(A)\]

such that \(\lambda\) is constant across space. Given \(N(A) = n\), the \(N\) events form an iid sample from the uniform distribution on \(A\). Under this model, for any two disjoint regions \(A\) and \(B\), the random variables \(N(A)\) and \(N(B)\) are independent.

If we observe \(n\) events in a region \(A\), the estimator \(\hat{\lambda} = n/\text{Area(A)}\) is unbiased if the model is correct.

This homogeneous Poisson model is known as complete spatial randomness (CSR). If points deviate from it, we might be able to detect this with a hypothesis test. We’ll return to this when we discuss distances to neighbors.

To simulate data from a CSR process in a square [0,1]x[0,1], we could

  • Generate the number of events from a Poisson distribution with \(\lambda\)
n <- rpois(1, lambda = 50)
  • Fix n and generate locations from the uniform
x <- runif(n,0,1)
y <- runif(n,0,1)
plot(x,y)
  • Alternatively, generate the data with rpoispp.
sim1 <- rpoispp(lambda = 50)
plot(sim1)

If the intensity is not constant across space (inhomogeneous Poisson process), then we define intensity as

\[\lambda(s) = \lim_{|ds|\rightarrow 0} \frac{E(N(ds))}{|ds|}\]

where \(ds\) is a small area element and \(N(A)\) has a Poisson distribution with mean

\[\mu(A) = \int_A \lambda(s)ds\]

When working with point process data, we generally want to estimate and/or model \(\lambda(s)\) in a region \(A\) and determine if \(\lambda(s) = \lambda\) for \(s\in A\).

7.4.2 Non-Parametric Intensity Estimation

7.4.2.1 Quadrat Estimation

One way to estimate the intensity \(\lambda(s)\) requires dividing the study area into sub-regions (also known as quadrats). Then, the estimated density is computed for each quadrat by dividing the number of points in each quadrat by the quadrat’s area. Quadrats can take on many shapes, such as hexagons and triangles or the typically square quadrats.

If the points have uniform/constant intensity (CSR), the quadrat counts should be Poisson random numbers with constant mean. Using a \(\chi^2\) goodness-of-fit test statistic, we can test \(H_0:\) CSR,

plot(quadratcount(bei, nx=4, ny=4))
(T <- quadrat.test(bei, nx=4, ny=4))
plot(bei)
plot(T, add=TRUE)

The choice of quadrat numbers and shape can influence the estimated density and must be chosen carefully. If the quadrats are too small, you risk having many quadrats with no points, which may prove uninformative (and can cause issues if you are trying to run a \(\chi^2\) test). If very large quadrat sizes are used, you risk missing subtle changes in spatial density.

You should wonder why if the density is not uniform across the space. Is there a covariate (characteristic of the space that could be collected at every point in space) that could help explain the difference in the intensity? For example, perhaps the elevation of an area could be impacting the intensity of trees that thrive in the area. Or there could be a west/east or north/south pattern such that the longitude or latitude impacts the intensity.

If there is a covariate, we can convert the covariate across the continuous spatial field into discretized areas. We can then plot the relationship between the estimated point density within the quadrat and the covariate regions to assess any dependence between variables. If there is a clear relationship, the covariate levels can define new sub-regions within which the density can be computed. This is the idea of normalizing data by some underlying covariate.

The quadrat analysis approach has its advantages in that it is easy to compute and interpret; however, it does suffer from the modifiable areal unit problem (MAUP) as the relationship observed can change depending on the size and shape of the quadrats chosen. Another density-based approach that will be explored next (less susceptible to the MAUP) is the kernel density estimation process.

7.4.2.2 Kernel Density Estimation

The kernel density approach is an extension of the quadrat method. Like the quadrat density, the kernel approach computes a localized density for subsets of the study area. Still, unlike its quadrat density counterpart, the sub-regions overlap, providing a moving sub-region window. A kernel defines this moving window. The kernel density approach generates a grid of density values whose cell size is smaller than the kernel window’s. Each cell is assigned the density value computed for the kernel window centered on that cell.

A kernel not only defines the shape and size of the window, but it can also weight the points following a well-defined kernel function. The simplest function is a basic kernel where each point in the window is assigned equal weight (uniform kernel). Some popular kernel functions assign weights to points inversely proportional to their distances to the kernel window center. A few such kernel functions follow a Gaussian or cubic distribution function. These functions tend to produce a smoother density map.

\[\hat{\lambda}_{\mathbf{H}}(\mathbf{s}) = \frac{1}{n}\sum^n_{i=1}K_{\mathbf{H}}(\mathbf{s} - \mathbf{s}_i)\]

where \(\mathbf{s}_1,...,\mathbf{s}_n\) are the locations of the observed points (typically specified by longitude and latitude),

\[K_{\mathbf{H}}(\mathbf{x}) = |\mathbf{H}|^{-1/2}K(\mathbf{H}^{-1/2}\mathbf{x})\]

\(\mathbf{H}\) is the bandwidth matrix, and \(K\) is the kernel function, typically assumed multivariate Gaussian. Still, it could be another symmetric function that decays to zero as it moves away from the center.

To incorporate covariates into kernel density estimation, we can estimate a normalized density as a function of a covariate; we notate this as \(\rho(Z(s))\) where \(Z(s)\) is a continuous spatial covariate. There are three ways to estimate this: ratio, re-weight, or transform. We will not delve into the differences between these methods but note that there is more than one way to estimate \(\rho\). This is a non-parametric way to estimate the intensity.

7.4.2.3 Data Example and R Code

Below is a point pattern giving the locations of 3605 trees in a tropical rain forest.

library(spatstat)
plot(bei)

Below are two pixel images of covariates, the elevation and slope (gradient) of the elevation in the study region.

plot(bei.extra$elev)
plot(bei.extra$grad)

Let’s plot the points on top of the covariates to see if we can see any relationships.

plot(bei.extra$elev, main = "")
plot(bei, add = TRUE, cex = 0.3, pch = 16, cols = "white")
plot(bei.extra$grad, main = "")
plot(bei, add = TRUE, cex = 0.3, pch = 16, cols = "white")

We could convert the image above of the elevation to a tesselation, count the number of points in each region using quadratcount, and plot the quadrat counts.

elev <- bei.extra$elev
Z <- cut(elev, 4, labels=c("Low", "Med-Low", "Med-High", "High"))
textureplot(Z, main = "")
Y <- tess(image = Z)
qc <- quadratcount(bei, tess = Y)
plot(qc)

intensity(qc)

Using a non-parametric kernel density estimation, we can estimate the intensity as a function of the elevation.

rh <- rhohat(bei, elev)
plot(rh)

Then predict the intensity based on this function.

prh <- predict(rh)
plot(prh, main = "")
plot(bei, add = TRUE, cols = "white", cex = .2, pch = 16)

Contrast this to a simple kernel density estimate without a covariate:

dbei <- density(bei)
plot(dbei, main = "")
plot(bei, add = TRUE, cols = "white", cex = .2, pch = 16)
par(mfrow=c(1,2))
plot(prh, main = "With Covariate")
plot(dbei, main = "Without Covariate")

7.4.3 Parametric Intensity Estimation

Beyond kernel density estimates, we may want to model this intensity with a parametric model. We’ll use a log-link function between the linear model and our intensity.

We assume \(log(\lambda(s)) = \beta_0\) for a uniform Poisson process.

ppm(bei ~ 1)

We may think that the x coordinate linearly impacts the intensity, \(log(\lambda(s)) = \beta_0 + \beta_1 x\).

ppm(bei ~ x)

We may think that the x and y coordinates linearly impact the intensity, \(log(\lambda(s)) = \beta_0 + \beta_1 x + \beta_2 y\).

ppm(bei ~ x + y)

We could use a variety of models based solely on the x and y coordinates of their location:

(mod <- ppm(bei ~ polynom(x,y,2))) #quadratic relationship
(mod1 <- ppm(bei ~ I( x > 50))) #threshold
require(splines)
(mod2 <- ppm(bei ~ bs(x) + bs(y))) #B-spline

plot(predict(mod))
plot(predict(mod1))
plot(predict(mod2))

We can compare models using a likelihood ratio test.

hom <- ppm(bei ~ 1)

anova(hom, mod, test = 'Chi')

Also, considering the residuals, we should see if there is a pattern where our model has errors in predicting the intensity.

diagnose.ppm(mod, which="smooth")

As we saw with the kernel density estimation, the elevation plays a role in the intensity of trees. Let’s add that to our model,

\[\log(\lambda(s)) = \beta_0 + \beta_1 Elevation(s)\]

(mod2 <- ppm(bei ~ elev))
diagnose.ppm(mod2, which="smooth")
plot(effectfun(mod2))

But the relationship may not be linear, so let’s try a quadratic relationship with elevation,

\[\log(\lambda(s)) = \beta_0 + \beta_1 Elevation(s)+ \beta_2 Elevation^2(s)\]

(mod2 <- ppm(bei ~ polynom(elev,2)))
diagnose.ppm(mod2, which="smooth")
plot(effectfun(mod2))

7.4.4 Detecting Interaction Effects

Classical tests about the interaction effects between points are based on derived distances.

  • Pairwise distances: \(d_{ij} = ||s_i - s_j||\) (pairdist)
pairdist(bei)[1:10,1:10] #distance matrix for each pair of points
  • Nearest neighbor distances: \(t_i = \min_{j\not= i} d_{ij}\) (nndist)
mean(nndist(bei)) #average first nearest neighbor distance
mean(nndist(bei,k=2)) #average second nearest neighbor distance

ANN <- apply(nndist(bei, k=1:100), 2, mean) #Mean for 1,...,100 nearest neighbors
plot(ANN ~ I(1:100), type="b", main=NULL, las=1)
  • Empty space distances: \(d(u) = \min_{i}||u- s_i||\), the distance from a fixed reference location \(u\) in the window to the nearest data point (distmap)
plot(distmap(bei)) #map of the distances to the nearest observed point

7.4.4.1 F function

Consider the empty-space distance.

  • Fix the observation window \(A\). The distribution of \(d(u)\), the minimum distance to an observed point, depends on \(A\), which is hard to derive in closed form.

Consider the CDF,

\[F_u(r) = P(d(u) \leq r)\] \[= P(\text{ at least one point within radius }r\text{ of }u)\] \[= 1 - P(\text{ no points within radius }r\text{ of }u)\]

For a homogeneous Poisson process on \(R^2\),

\[F(r) = 1 - e^{-\lambda \pi r^2}\]

  • Consider the process defined on \(R^2\) and we only observe it on \(A\). The observed distances are biased relative to the “true” distances, which results in “edge effects”.

  • If you define a set of locations \(u_1,...,u_m\) for estimating \(F(r)\) under an assumption of statitionary, the estimator

\[\hat{F}(r) = \frac{1}{m} \sum^m_{j=1}I(d(u_j)\leq r)\]

is biased due to the edge effects.

  • There are a variety of corrections.
    • One example is to not consider \(u\)’s within distance \(r\) of the boundary (border correction).
    • Another example is considering it a censored data problem and using a spatial variant of the Kaplan-Meier correction.

Compare the \(\hat{F}(r)\) to the \(F(r)\) under a homogenous Poisson process.

plot(Fest(bei)) #km is kaplan meier correction, bord is the border correction (reduced sample)

In this plot, we observe fewer short distances than expected if it were a completely random Poisson point process (\(F_{pois}\)) and thus many longer distances to observed points. Thus, more spots don’t have many points (larger distances to observed points), meaning points are more clustered than expected from a uniform distribution across the space.

7.4.4.2 G function

Working instead with the nearest-neighbor distances from points themselves (\(t_i = \min_{j\not= i} d_{ij}\)), we define

\[G(r) = P(t_i \leq r)\] for an arbitrary observed point \(s_i\). We can estimate it using our observed $ t_i$’s.

\[\hat{G}(r) = \frac{1}{n}\sum^n_{i=1}I(t_i \leq r)\]

As before, we may or may not make an edge correction.

For a homogeneous Poisson process on \(R^2\),

\[G(r) = 1 - e^{-\lambda \pi r^2}\]

Compare the \(\hat{G}(r)\) to the \(G(r)\) under a homogenous Poisson process.

plot(Gest(bei)) 

Here we observe that we see more short nearest-neighbor distances than expected if it were a completely random Poisson point process. Thus, the points are more clustered than expected from a uniform distribution across the space.

7.4.4.3 K function

Another approach is to consider Ripley’s K function, which summarizes the distance between points. It divides the mean of the total number of points at different distance lags from each point by the area event density. In other words, it is concerned with the number of events falling within a circle of radius \(r\) of an event.

\[K(r) = \frac{1}{\lambda}E(\text{ points within distance r of an arbitrary point})\]

Under CSR (uniform intensity), we’d expect \(K(r) = \pi r^2\) because that is the area of the circle. Points that cluster more than you’d expect corresponds to \(K(r) > \pi r^2\) and spatial repulsion corresponds to \(K(r) < \pi r^2\).

Below, we have an estimate of the K function (black solid) along with the predicted K function if \(H_0\) were true (red dashed). For this set of trees, the points are more clustered than you’d expect for a homogenous Poisson process.

plot(Kest(bei))

7.4.4.4 L function

It is hard to see differences between the estimated \(K\) and expected functions. An alternative is to transform the values so that the expected values are horizontal. The transformation is calculated as follows:

\[L(r) = \sqrt{K(r)/\pi} - d\]

plot(Kest(bei),sqrt(./pi) - r ~ r)

7.4.4.5 Pair Correlation Function

The second-order intensity function is

\[\lambda_2(s_1,s_2) = \lim_{|ds_1|,|ds_2| \rightarrow 0} \frac{E[N(ds_1)N(ds_2)]}{|ds_1||ds_2|}\]

where \(s_1 \not= s_2\). If \(\lambda(s)\) is constant and

\[\lambda_2(s_1,s_2) = \lambda_2(s_1 - s_2) =\lambda_2(s_2 - s_1)\] then we call \(N\) second-order stationary.

We can define \(g\) as the pair correlation function (PCF),

\[g(s_1,s_2) = \frac{\lambda_2(s_1,s_2)}{\lambda(s_1)\lambda(s_2)}\]

In R, if we assume stationarity and isotropy (direction of distance is not important), we can estimate \(g(r)\) where \(r = ||s_1-s_2||\). If a process has an isotropic pair correlation function, then our K function can be defined as

\[K(r) = 2\pi \int^r_0 ug(u)du, r>0\]

plot(pcf(bei))

If \(g(r) = 1\), then we have a CSR process. If \(g(r)> 1\), it implies a cluster process and if \(g(r)<1\), it implies an inhibition process. In particular, if \(g(r) = 0\) for \(r_i<r\), it implies a hard core process (no point pairs within this distance).

7.4.4.6 Envelopes

We want to test \(H_0\): CSR (constant intensity) to understand whether points are closer or further away from each other than we’d expect with a Poisson process.

We can compare the estimated function to the theoretical for any of these 3 (4) functions, but that doesn’t tell us how far our estimate might be from the truth under random variability. So we can simulate under our null hypothesis (CSR), say \(B\) times, and for each time, we estimate the function. Then we create a single global test:

\[D_i = \max_r |\hat{H}_i(r) - H(r)|,\quad i=1,...,B\]

where \(H(r)\) is the theoretical value under CSR. Then we define \(D_{crit}\) as the \(k\)th largest among the \(D_i\). The “global” envelopes are then.

\[L(r) = H(r) - D_{crit}\] \[U(r) = H(r) + D_{crit}\]

Then the test that rejects \(\hat{H}_1\) (estimate from our observations) ever wanders outside \((L(r), U(r))\) has size \(\alpha = 1 - k/B\).

#See Example Code Below: Runs Slowly
plot(envelope(bei, Fest, global=TRUE))
#plot(envelope(bei, Gest, global=TRUE))
#plot(envelope(bei, Kest, global=TRUE))
#plot(envelope(bei, Lest, global=TRUE))
#plot(envelope(bei, pcf, global=TRUE))

7.4.5 Cluster Poisson Processes

A Poisson cluster process is defined by

  1. Parent events form Poisson process with intensity \(\lambda\).

  2. Each parent produces a random number \(M\) of children, iid for each parent according to a discrete distribution \(p_m\).

  3. The positions of the children relative to parents are iid according to a bivariate pdf.

By convention, the point pattern consists of only the children, not the parents.

7.4.5.1 Matern Process

  • Homogeneous Poisson parent process with intensity \(\lambda\).

  • Poisson distributed number of children, mean = \(\mu\)

  • Children uniformly distributed on disc of radius, \(r\), centered at the parent location

Let’s simulate some data from this process.

win <- owin(c(0, 100), c(0, 100))
clust1 <- rMatClust(50/10000, r = 4, mu = 5, win = win)
plot(clust1)
quadrat.test(clust1)

plot(envelope(clust1, Fest, global=TRUE))
#plot(envelope(clust1, Gest, global=TRUE))
#plot(envelope(clust1, Kest, global=TRUE))
#plot(envelope(clust1, Lest, global=TRUE))

If we increase the radius of space for the children, we won’t notice that it is clustered.

clust2 <- rMatClust(50/10000, r = 40, mu = 5, win = win)
plot(clust2)
quadrat.test(clust2)

plot(envelope(clust2, Fest, global=TRUE))
#plot(envelope(clust2, Gest, global=TRUE))
#plot(envelope(clust2, Kest, global=TRUE))
#plot(envelope(clust2, Lest, global=TRUE))

7.4.5.2 Thomas Process

  • Homogeneous Poisson parent process

  • Poisson distributed number of children

  • Locations of children according to an isotropic bivariate normal distribution with variance \(\sigma^2\)

clust3 <- rThomas(50/10000, scale = 2, mu = 5, win = win)
plot(clust3)

quadrat.test(clust3)

plot(envelope(clust3, Fest, global=TRUE))
plot(envelope(clust3, Gest, global=TRUE))
plot(envelope(clust3, Kest, global=TRUE))
plot(envelope(clust3, Lest, global=TRUE))

7.4.5.3 Fitting Cluster Poisson Process Models

If you believe the data are clustered, you can fit a model that accounts for any trends in \(\lambda(s)\) and clustering in points in the generation process. Two potential models you can fit are the Thomas and the Matern model.

plot(redwood)
summary(kppm(redwood, ~1, "Thomas"))
AIC(kppm(redwood, ~1, "Thomas", method='palm'))


summary(kppm(redwood, ~x, "Thomas"))
AIC(kppm(redwood, ~x, "Thomas", method='palm'))


summary(kppm(redwood, ~1, "MatClust"))
AIC(kppm(redwood, ~1, "MatClust",method='palm'))

summary(kppm(redwood, ~x, "MatClust"))
AIC(kppm(redwood, ~x, "MatClust",method='palm'))

We can then interpret the estimates for that process to tell us about the clustering.

INTERPRETATIONS

  • theta: parameters for intensity trend
  • kappa: average number of clusters per unit area
  • scale: standard deviation of the distance of a point from its cluster center.
  • mu: mean number of points per cluster

7.4.6 Inhibition Poisson Processes

If points “repel” each other, we need to account for that also.

7.4.6.1 Simple Sequential Inhibition

Each new point is generated uniformly in the window (space of interest) and independently of preceding points. If the new point lies closer than r units from an existing point, it is rejected, generating another random point. The process terminates when no further points can be added.

plot(rSSI(r = 0.05, n = 200))

7.4.6.2 Matern I Inhibition Model

Matern’s Model I first generates a homogeneous Poisson process Y (intensity = \(\rho\)). Any pairs of points that lie closer than a distance \(r\) from each other are deleted. Thus, pairs of close neighbors annihilate each other.

The probability an arbitrary point survives is \(e^{-\pi\rho r^2}\) so the intensity of the final point pattern is \(\lambda = \rho e^{-\pi\rho r^2}\).

7.4.7 Other Point Process Models

  • Markov Point Processes: a large class of models that allow for interaction between points (attraction or repulsion)

  • Hard Core Gibbs Point Processes: a subclass of Markov Point Processes that allow for interaction between points; no interaction is a Poisson point process

  • Strauss Processes: a subclass of Markov Point Processes that allow for the repulsion between pairs of points

  • Cox Processes: a doubly stochastic Poisson process in which the intensity is a random process and conditional on the intensity, the events are an inhomogeneous Poisson process.

For more information on how to fit spatial point processes in R, see http://www3.uji.es/~mateu/badturn.pdf.