3 Random Processes

Today’s Learning Goals

Understand mathematical notation used to describe the first and second moments of a sequence of random variables (random process).
Construct a covariance matrix from a given autocovariance function.
Generate correlated data from a given covariance matrix.

Slides from today are available here.

Group Activity

Download a template RMarkdown file to start from here.

Covariance Matrix

Create a 3x3 symmetric covariance matrix that is “stationary” (constant variance over time and covariance is only a function of distance in time).

#Fill in the ? to satisfy the requirements above
rowOne <-c(?,?,?)
rowTwo <- c(?,?,?)
rowThree <- c(?,?,?)

D <- c(rowOne, rowTwo, rowThree)

(Sigma <- matrix(D, byrow=TRUE, nrow=3, ncol=3))

Make sure that the covariance matrix is positive semi-definite by trying the Cholesky Decomposition. If you get an error, go back to #1 and make sure that your covariance matrix satisfies the stated conditions (and the general conditions for variance and covariance).

#chol() gives you upper triangular matrix R such that Sigma = t(R) %*% R
L <- t(chol(Sigma)) # we want lower triangular matrix L such that Sigma = L %*% t(L)
L 

L %*% t(L) #double check it gives you Sigma back!

Generating Correlated Data

Now generate three random values that are correlated, according to your specified covariance matrix from above. Can you tell these three values are correlated? What might you need to do convince yourself that these are correlated in the way you specified above?

ANSWER:

z <- rnorm(3)
L %*% z # matrix multiplication

Now, let’s generate a random process of length 100 where the constant variance is 4 and the correlation decays such that with lag 1 it is 0.9, with lag 2 it is \(0.9^2\), with lag 3 it is \(0.9^3\), etc.

If we assume that the covariance only depends on the distance in time indices (lags), then we can estimate the covariance as a function of that distance in time indices (lags). Does the estimated covariance look like what you’d expect? Explain what features you were expecting.

ANSWER:

t <- 1:100
D <- as.matrix(dist(t)) #100x100 matrix with values as lags between every possible value

COR <- 0.9^D #correlation

COV <- COR*4 #covariance (multiply by constant variance)

L <- t(chol(COV)) #Cholesky decomposition

z <- rnorm(100)
x <- L %*% z #generate 100 correlated values

acf(x, type = 'covariance') #estimated covariance based on distance in time index

acf(x) #estimated correlation based on distance in time index

Repeat the process of generating a random process of 100 observations but change the correlation at distance of 1 index ( = 1 lag) to something other than 0.9 and see how that impacts the estimated covariance and correlation. What changes?

ANSWER:

Then repeat the process of creating a random process and increase the length of the series. How does that impact the estimated covariance and correlation functions?

ANSWER:

Is there any particular material from the probability review section that you’d like us to spend more time on in class?

ANSWER:

Challenge

Return to #3 and now try to write R code to convince yourself that the 3 data points you generate using that process are actually correlated. In other words, write R code to estimate the covariance and correlation of data generated in this way. Yes, this is purposefully vague. Consider what is needed in order to estimate covariance

ANSWER: