Homework 2

You can download a template file to start from here.

Submission:

Generating Correlated Data and Estimating Covariance and Correlation

Imagine that we generate a random process where the next observation is equal to the 0.90 times the previous value plus some independent noise. Start at 0. Write R code (only using base R and tidyverse package functions) to generate a random process of 500 observations, plot that realized process, \(x_t\), as a function of the index \(t\) using a line plot, and then estimate the covariance and correlation function as a function of lag \(h\), assuming the process is stationary.

library(tidyverse)
set.seed(452)

Imagine that we generate a random process where the next observation is equal to the -0.30 times the previous value plus some independent noise. Start at 0. Write R code (only using base R and tidyverse package functions) to generate a random process of 500 observations, plot that realized process, \(x_t\), as a function of the index \(t\) using a line plot, and then estimate the covariance and correlation function as a function of lag \(h\), assuming the process is stationary.

Create a 10x10 covariance matrix where the constant variance is 1 and the correlation for all other lags is 0.7 (except lag = 0). Use the Cholesky Decomposition method to generate 500 realizations of 10 observations of a random process with that covariance structure. Estimate the covariance and correlation matrices based on the 500 series WITHOUT assuming it is stationary. To do this, organize the 500 series of 10 observations into a 500x10 matrix Y and use cov(Y) and cor(Y).

Create a 10x10 covariance matrix where the constant variance is 0.25 and the correlation for all other lags is 0.7 (except lag = 0). Use the Cholesky Decomposition method to generate 500 realizations of 10 observations of a random process with that covariance structure. Estimate the covariance and correlation matrices based on the 500 series WITHOUT assuming it is stationary. To do this, organize the 500 series of 10 observations into a 500x10 matrix Y and use cov(Y) and cor(Y).

In all of the exercises above, you generated 500 realizations of a random process. What would change in the estimation if you use 50 realizations? What would change if you used 5000 realizations? Why? Write your answer in 100-150 words. Feel free to use R code to supplement your written answer.

In 100-150 words, write a summary of the differences in generating correlated data used above and the differences in the assumptions and estimation process of the covariance and correlation of the random processes based on realizations from that process. This is an opportunity to stop and notice the big picture before we get further into details.