5.5 Moving Average Models

In contrast to the autoregressive model, we will consider a model that considers the current outcome value to be a function of current and past noise (rather than the past outcomes). This is a subtle difference but you’ll see that a moving average (MA) model is very different than an AR model, but we’ll show how they are connected.

Note: This is different from the moving average filter that we used to estimate the trend.

5.5.1 MA(1) Model

A moving average process of order 1 or MA(1) model is a weighted sum of a current random error plus the most recent error, and can be written as

\[Y_t = \delta + W_t + \theta_1W_{t-1}\]

where \(\{W_t\}\) is independent Gaussian white noise, \(W_t \stackrel{iid}{\sim} N(0,\sigma^2_w)\).

Like AR models, we often let \(\delta = 0\).

Properties

Unlike AR(1) processes, MA(1) processes are always weakly stationary. We see the variance is constant (not a function of time),

\[Var(Y_t) = Var(W_t + \theta_1W_{t-1}) = \sigma^2_w (1 + \theta_1^2)\]

Let’s look at the autocorrelation function of a MA(1) process.

To derive the autocorrelation function, let’s start with the covariance at lag 1. Plug in the model for \(Y_t\) and \(Y_{t-1}\) and use the properties to simplify the expression,

\[\Sigma_Y(1) = Cov(Y_t, Y_{t-1}) = Cov(W_t + \theta_1W_{t-1}, W_{t-1} + \theta_1W_{t-2})\] \[= Cov(W_t,W_{t-1}) + Cov(W_t, \theta_1W_{t-2}) + Cov(\theta_1W_{t-1}, W_{t-1}) + Cov(\theta_1W_{t-1},\theta_1W_{t-2})\] \[ = 0 + 0 + \theta_1Cov(W_{t-1}, W_{t-1}) + 0\] \[= \theta_1Var(W_{t-1}) \] \[ = \theta_1Var(W_{t}) = \theta_1\sigma^2_w\]

because $ W_t$’s are independent of each other.

For larger lags \(k>1\),

\[\Sigma_Y(k) = Cov(Y_t, Y_{t-k}) = Cov(W_t + \theta_1W_{t-1}, W_{t-k} + \theta_1W_{t-k-1}) \] \[= Cov(W_t,W_{t-k}) + Cov(W_t, \theta_1W_{t-k-1}) + Cov(\theta_1W_{t-1}, W_{t-k}) + Cov(\theta_1W_{t-1},\theta_1W_{t-k-1})\]

\[ = 0\text{ if }k>1\] because \(W_t\)’s are independent of each other.

Now, the correlation is derived as a function of the covariance divided by the variance (which we found above),

\[\rho_1 = \frac{Cov(Y_t,Y_{t-1})}{Var(Y_t)} = \frac{\Sigma_Y(1)}{Var(Y_t)} = \frac{\theta_1\sigma^2_w}{\sigma^2_w(1+\theta_1^2)} = \frac{\theta_1}{(1+\theta_1^2)}\] and

\[\rho_k = \frac{Cov(Y_t,Y_{t-k})}{Var(Y_t)} = \frac{\Sigma_Y(k)}{Var(Y_t)} = 0\text{ if } k>1\]

So the autocorrelation function is non-zero at lag 1 for an MA(1) process and zero otherwise. Keep this in mind as we look at sample ACF functions.

Sample ACF for MA(1): Zero for lags > 1

Simulated Data Example

# Simulate a MA1 process  
# x = e + theta1 e(t-1) 
# Create the vector x
x   <- vector(length=1000)
theta1 <- 0.5

#simulate the white noise errors
e   <- rnorm(1000)

x[1] <- e[1]

#Fill the vector x
 for(i in 2:length(x))
    {
        x[i]    <- e[i] + theta1*e[i-1]  
 }

x <- ts(x)
plot(x)

acf(x)

Notice how the autocorrelation = 1 at lag 0 and then around 0.4 at lag 1. The autocorrelation estimate is in between the blue lines for the other lags, so they are practically zero.

Invertibility

No restrictions on \(\theta_1\) are needed for an MA(1) process to be stationary. However, imposing restrictions on \(\theta_1\) is generally desirable to ensure the MA process is invertible.

For an MA process to be invertible, we must be able to write it as an AR(\(\infty\)) process that converges. We’ll talk more about this soon.

We want to restrict ourselves to only invertible processes because of the non-uniqueness of the ACF. Let’s imagine these two processes,

\[A:\quad Y_t = W_t + \theta_1W_{t-1}\] \[B:\quad Y_t = W_t + \frac{1}{\theta_1}W_{t-1}\]

Let’s show that they have the same ACF. We can use our derivations about the MA(1) process. For process B,

\[\rho_1 = \frac{1/\theta_1}{(1+1/\theta_1^2)} = \frac{1}{\theta_1(1+1/\theta_1^2)} = \frac{1}{\theta_1(\frac{\theta_1^2 + 1}{\theta_1^2})}= \frac{1}{\theta_1}\frac{\theta_1^2}{\theta_1^2 + 1} = \frac{\theta_1}{\theta_1^2 + 1}\]

which is the same autocorrelation as process A.

Now, let’s invert process A by rewriting it for \(W_t\) as a function of \(Y_t\),

\[W_t = Y_t - \theta_1W_{t-1}\] and now let’s plug in the model for \(W_{t-1}\),

\[W_t = Y_t - \theta_1(Y_{t-1} - \theta_1W_{t-1}) = Y_t - \theta_1Y_{t-1} - \theta_1^2W_{t-1}\]

and if you keep going, you get an infinite sum,

\[W_t = Y_t - \theta_1(Y_{t-1} - \theta_1W_{t-1}) = Y_t - \theta_1Y_{t-1} - \theta_1^2Y_{t-2} - \cdots\]

or equivalently, an autoregressive model of order \(\infty\) that is stationary if \(|\theta_1| < 1\),

\[Y_t = W_t + \theta_1Y_{t-1} + \theta_1^2Y_{t-2} + \cdots\] Additionally, this infinite sum only converges to a finite value when \(|\theta_1| < 1\).

If you invert the second process (process B), this infinite sum will not converge if \(|\theta_1| < 1\),

\[W_t = Y_t - 1/\theta_1Y_{t-1} - 1/\theta_1^2Y_{t-2} - \cdots\]

Thus, the first process is invertible, and the second is not if \(|\theta_1| < 1\). When we can invert the process, notice that we have written an MA(1) process as an autoregressive process of infinite order.

5.5.2 MA(q) Model

A moving average process of order q or MA(q) model is a weighted sum of a current random error plus \(q\) most recent errors, and can be written as

\[Y_t = \delta + W_t + \theta_1W_{t-1}+ \theta_2W_{t-2}\cdots + \theta_qW_{t-q}\] where \(\{W_t\}\) is independent Gaussian white noise, \(W_t \stackrel{iid}{\sim} N(0,\sigma^2_w)\). Similar to AR models, we will often let \(\delta=0\).

Properties

As with MA(1), we see the variance is constant (and not a function of time),

\[Var(Y_t) = Var(W_t + \theta_1W_{t-1}+ \theta_2W_{t-2}\cdots + \theta_qW_{t-q}) = \sigma^2_w (1 + \sum^q_{i=1}\theta_i^2)\]

More generally for a MA(q) process, the autocorrelation is non-zero for the first q lags and zero for lags > q.

We can show this for an MA(2) process and higher ordered models using the same techniques as before.

Sample ACF for MA(q): Zero for lags > q

Simulated Data Example

# Simulate a MA2 process  
# x = e + theta1 e(t-1)  + theta2 e(t-2) 
# Create the vector x
x   <- vector(length=1000)
theta1 <- 0.5
theta2 <- -0.2

#simulate the white noise errors
e   <- rnorm(1000)

x[1] <- e[1]
x[2] <- e[2]

#Fill the vector x
 for(i in 3:length(x))
    {
        x[i]    <- e[i] + theta1*e[i-1]  + theta2*e[i-2]  
 }

x <- ts(x)
plot(x)

acf(x)

Notice at lag = 0, the autocorrelation is 1, at lag = 1 and 2, the autocorrelation is non-zero for a MA(2) model, and for all other lags, the autocorrelation is practically zero (within blue lines).

Invertibility

To check the invertibility of an MA(q) process, we need to learn about the backward shift operator, denoted \(B\), which is defined as

\[B^jY_t = Y_{t-j}\] The backshift operator is notation that allows us to simplify how we write down the MA(q) model.

The MA(q) model can be written as

\[Y_t = (\theta_0 + \theta_1B + \cdots + \theta_qB^q)W_t \] \[= \theta(B)W_t \]

where \(\theta(B)\) is a polynomial of order \(q\) in terms of \(B\). It can be shown that an MA(q) process is invertible if the roots of the equation,

\(\theta(B) = (\theta_0 + \theta_1B + \cdots + \theta_qB^q) = 0\)

all lie outside the unit circle, where \(B\) is regarded as a complex variable and not an operator.

Remember: the roots are the values of \(B\) in which \(\theta(B) = 0\).

For example, a MA(1) process has a polynomial \(\theta(B) = 1+\theta_1B\) with roots \(B = -1/\theta_1\). This root is a real number, and it is outside the unit circle (\(>|1|\)) as long as \(|\theta_1|<1\). Rarely will you have to check this, but the software we will use restricts the values of \(\theta_j\) so that the process is invertible.