6 ACF and Random Walk

Learning Goals

Explain and implement covariance and correlation model estimation by assuming stationarity (ACVF, ACF).
Understand the derivations of variance and covariance for the Random Walk.
Generate data from a random walk in R and estimate variance and covariance.

Group Activity

Download a template RMarkdown file to start from here.

Data Motivation: Marine Diversity during the Phanerozoic

In a 2004 paper, Cornette and Lieberman suggested that marine diversity in the Phanerozoic period (the last 540 million years) follows a random walk.

Data of interest:

The total number of marine genera during a period of time (new - extinct)

For most of the Phanerozoic period, both CO2 and number of genera roughly followed a random walk up until the last 50-75 million years. This suggests a big change in the late Mesozoic and Cenozoic era.

Source: https://www.pnas.org/content/101/1/187

Random Walk

Let’s get a sense of what they are referring to when they talk about a random walk model.

Imagine that the total number of genera at time \(t\) is equal to the total number of genera in the last time period \(t-1\) plus some random change (due to new species coming into existence and other species dying out).

\[Y_t = Y_{t-1} + W_t\quad\quad W_t \stackrel{iid}{\sim} N(0, \sigma_w^2)\]

Generate a random walk of length 250 times. I’ve laid out some of the structure you need.

y <- rep(NA, 250) #Pre-allocate memory: generate a vector of missing values
y[1] <- rnorm(1) #initialization of life (little bang?)

for(i in 2:250){
  
  
}

plot(y, type='l')

Describe the plot of the random walk. What characteristics does it have?

ANSWER:

Estimate the correlation for each lag (difference in times), assuming it is stationary.

Comment on the estimated correlation function. What do you notice?

ANSWER:

Based on these two plots (plot of the series and the plot of acf), do you think a random walk is stationary? Is the mean constant? Is there constant variance over time? Is the covariance only a function of the lag?

ANSWER:

Generate 500 random walks of length 250 times. I’ve laid out some of the structure you need.

y <- matrix(rep(NA, 500*250),nrow=500, ncol=250) #generate a matrix of missing values

for(l in 1:500){
  y[l,1] <- 
  for(i in 2:250){
  y[l,i] <- 
  
  }
}

Calculate the mean and variance at each time point, averaging over the 500 series (500 realizations) using the function apply(). Run ?apply in console to see documentation. Plot the mean over time and plot the variance over time. With this additional information, do you think that a random walk is stationary?

ANSWER:

Calculate the covariance and correlation matrix for a random walk based on these 500 realizations. Then plot them based on the lag. With this additional information, do you think that a random walk is stationary?

COV <- as.vector(cov(y))
COR <- as.vector(cor(y))

lags <- as.vector(as.matrix(dist(1:250)))

plot(lags, COV, pch='.')

plot(lags, COR,  pch='.')

ANSWER:

Now use probability theory to confirm what you think. We let

\[Y_t = Y_{t-1} + W_t\quad\quad W_t \stackrel{iid}{\sim} N(0, \sigma_w^2)\]

If we plug in \(Y_{t-1} = Y_{t-2} + W_{t-1}\) into the equation, we get

\[Y_t = Y_{t-2} + W_{t-1} + W_t\]

If we continue this plugging in, we get

\[Y_t = \sum_{i = 1}^t W_i \quad\quad W_i \stackrel{iid}{\sim} N(0, \sigma_w^2)\]

Now, let’s find the first two moments:

\[E(Y_t) = \]

\[Var(Y_t) = \]

\[Cov(Y_t, Y_{t-h}) = \]