3 Random Processes

Today’s Learning Goals

  • Understand mathematical notation used to describe the first and second moments of a sequence of random variables (random process).
  • Construct a covariance matrix from a given autocovariance function.
  • Generate correlated data from a given covariance matrix.




Group Activity

Download a template RMarkdown file to start from here.

Covariance Matrix

  1. Create a 3x3 symmetric covariance matrix that is “stationary” (constant variance over time and covariance is only a function of distance in time).
#Fill in the ? to satisfy the requirements above
rowOne <-c(?,?,?)
rowTwo <- c(?,?,?)
rowThree <- c(?,?,?)

D <- c(rowOne, rowTwo, rowThree)

(Sigma <- matrix(D, byrow=TRUE, nrow=3, ncol=3))
  1. Make sure that the covariance matrix is positive semi-definite by trying the Cholesky Decomposition. If you get an error, go back to #1 and make sure that your covariance matrix satisfies the stated conditions (and the general conditions for variance and covariance).
#chol() gives you upper triangular matrix R such that Sigma = t(R) %*% R
L <- t(chol(Sigma)) # we want lower triangular matrix L such that Sigma = L %*% t(L)
L 

L %*% t(L) #double check it gives you Sigma back!

Generating Correlated Data

  1. Now generate three random values that are correlated, according to your specified covariance matrix from above. Can you tell these three values are correlated? What might you need to do convince yourself that these are correlated in the way you specified above?

ANSWER:

z <- rnorm(3)
L %*% z # matrix multiplication
  1. Now, let’s generate a random process of length 100 where the constant variance is 4 and the correlation decays such that with lag 1 it is 0.9, with lag 2 it is \(0.9^2\), with lag 3 it is \(0.9^3\), etc.

If we assume that the covariance only depends on the distance in time indices (lags), then we can estimate the covariance as a function of that distance in time indices (lags). Does the estimated covariance look like what you’d expect? Explain what features you were expecting.

ANSWER:

t <- 1:100
D <- as.matrix(dist(t)) #100x100 matrix where values are lags between every possible value

COR <- 0.9^D #correlation

COV <- COR*4 #covariance (multiply by constant variance)

L <- t(chol(COV)) #Cholesky decomposition

z <- rnorm(100)
x <- L %*% z #generate 100 correlated values

acf(x, type = 'covariance') #estimated covariance based on distance in time index

acf(x) #estimated correlation based on distance in time index
  1. Repeat the process of generating a random process of 100 observations but change the correlation at distance of 1 index ( = 1 lag) to something other than 0.9 and see how that impacts the estimated covariance and correlation. What changes?

ANSWER:

  1. Then repeat the process of creating a random process and increase the length of the series. How does that impact the estimated covariance and correlation functions?

ANSWER:

  1. Is there any particular material from the probability review section that you’d like us to spend more time on in class? Discuss with your table. Come up with a list of topics; add to #prob-theory Slack channel.

Challenge

  1. Return to #3 and now try to write R code to convince yourself that the 3 data points you generate using that process are actually correlated. In other words, write R code to estimate the covariance and correlation of data generated in this way. Yes, this is purposefully vague. Consider what is needed in order to estimate covariance

ANSWER: