6 ACF and Random Walk
Learning Goals
- Explain and implement covariance and correlation model estimation by assuming stationarity (ACVF, ACF).
- Understand the derivations of variance and covariance for the Random Walk.
- Generate data from a random walk in R and estimate variance and covariance.
Group Activity
Download a template RMarkdown file to start from here.
Data Motivation: Marine Diversity during the Phanerozoic
In a 2004 paper, Cornette and Lieberman suggested that marine diversity in the Phanerozoic period (the last 540 million years) follows a random walk.
Data of interest:
- The total number of marine genera during a period of time (new - extinct)
For most of the Phanerozoic period, both CO2 and number of genera roughly followed a random walk up until the last 50-75 million years. This suggests a big change in the late Mesozoic and Cenozoic era.
Random Walk
Let’s get a sense of what they are referring to when they talk about a random walk model.
Imagine that the total number of genera at time \(t\) is equal to the total number of genera in the last time period \(t-1\) plus some random change (due to new species coming into existence and other species dying out).
\[Y_t = Y_{t-1} + W_t\quad\quad W_t \stackrel{iid}{\sim} N(0, \sigma_w^2)\]
- Generate a random walk of length 250 times. I’ve laid out some of the structure you need.
y <- rep(NA, 250) #Pre-allocate memory: generate a vector of missing values
y[1] <- rnorm(1) #initialization of life (little bang?)
for(i in 2:250){
}
plot(y, type='l')
Describe the plot of the random walk. What characteristics does it have?
ANSWER:
- Estimate the correlation for each lag (difference in times), assuming it is stationary.
Comment on the estimated correlation function. What do you notice?
ANSWER:
- Based on these two plots (plot of the series and the plot of acf), do you think a random walk is stationary? Is the mean constant? Is there constant variance over time? Is the covariance only a function of the lag?
ANSWER:
- Generate 500 random walks of length 250 times. I’ve laid out some of the structure you need.
y <- matrix(rep(NA, 500*250),nrow=500, ncol=250) #generate a matrix of missing values
for(l in 1:500){
y[l,1] <-
for(i in 2:250){
y[l,i] <-
}
}
- Calculate the mean and variance at each time point, averaging over the 500 series (500 realizations) using the function
apply()
. Run?apply
in console to see documentation. Plot the mean over time and plot the variance over time. With this additional information, do you think that a random walk is stationary?
ANSWER:
- Calculate the covariance and correlation matrix for a random walk based on these 500 realizations. Then plot them based on the lag. With this additional information, do you think that a random walk is stationary?
COV <- as.vector(cov(y))
COR <- as.vector(cor(y))
lags <- as.vector(as.matrix(dist(1:250)))
plot(lags, COV, pch='.')
plot(lags, COR, pch='.')
ANSWER:
- Now use probability theory to confirm what you think. We let
\[Y_t = Y_{t-1} + W_t\quad\quad W_t \stackrel{iid}{\sim} N(0, \sigma_w^2)\]
If we plug in \(Y_{t-1} = Y_{t-2} + W_{t-1}\) into the equation, we get
\[Y_t = Y_{t-2} + W_{t-1} + W_t\]
If we continue this plugging in, we get
\[Y_t = \sum_{i = 1}^t W_i \quad\quad W_i \stackrel{iid}{\sim} N(0, \sigma_w^2)\]
Now, let’s find the first two moments:
\[E(Y_t) = \]
\[Var(Y_t) = \]
\[Cov(Y_t, Y_{t-h}) = \]