5.7 ARMA Models

You might wonder why it is necessary to know the theoretical properties of the AR and MA models (expected value, variance, covariance, correlation).

This probability theory is important because it will help you choose a model for time series data when you need to choose between the models:

Autoregressive model of order \(p\) AR(\(p\)),
Moving Average model of order \(q\) MA(\(q\)),
a combination of those two models, called an ARMA(\(p,q\)) and
the values of \(p\) and/or \(q\).

An ARMA(\(p,q\)) model is written as

\[Y_t = \delta + \phi_1Y_{t-1}+ \phi_2Y_{t-2}+\cdots + \phi_pY_{t-p} +W_t +\theta_1W_{t-1}+ \theta_2W_{t-2}\cdots + \theta_qW_{t-q}\] where the white noise \(W_t \stackrel{iid}{\sim} N(0,\sigma^2_w)\).

Equivalently, using the backshift operator, an ARMA model can be written as

\[\phi(B)Y_t = \delta + \theta(B)W_t\] where

\[\phi(B) =1 - \phi_1B- \phi_2B^2 - \cdots - \phi_pB^p\] and

\[\theta(B) = 1+ \theta_1B+ \theta_2B^2 + \cdots + \theta_qB^q\]

In order for the ARMA(\(p\),\(q\)) model

to be a weakly stationary process, the roots of \(\phi(B)\) must be outside the unit circle
to be invertible, the roots of \(\theta(B)\) must be outside the unit circle

In practice, the computer will restrict the modeling fitting to ARMA weakly stationary and invertible models. But it is important to understand this restriction if you encounter any R errors.

This combined model is useful because you can adequately model a random process with an ARMA model with fewer parameters (\(\ phi\)’s or \(\theta\)’s) than a pure AR or pure MA process by itself.

Before we go any further, we need one more tool called the partial autocorrelation function.

5.7.0.1 Partial Autocorrelation

The partial autocorrelation function (PACF) measures the linear correlation of an observed value \(\{Y_t\}\) and a lagged version of itself \(\{Y_{t-k}\}\) with the linear dependence of \(\{Y_{t-1},...,Y_{t-(k-1)}\}\) removed. In other words, the PACF is the correlation between two observations in time \(k\) lags apart after accounting for shorter lags.

\[h_k = \begin{cases} \widehat{Cor}(Y_{t},Y_{t-1})\text{ if } k = 1\\ \widehat{Cor}(Y_{t},Y_{t-k} | Y_{t-1},...,Y_{t-(k-1)})\text{ if } k >1 \end{cases}\]

Note that the partial autocorrelation at lag 1 will be the same as the autocorrelation at lag 1. For the other lags, the partial autocorrelation measures the dependence after accounting for the relationships with observations closer in time.

Connection to Stat 155: When we interpret the slope coefficients keeping all other variables fixed, we are interpreting a relationship accounting for those other variables.

Similar to the ACF plots, the dashed horizontal lines indicate plausible values if the random process were actually independent white noise.

Let’s look at simulated independent Gaussian white noise.

x <- ts(rnorm(1000))
plot(x)

Then, the ACF and the PACF (Partial ACF) functions.

acf2(x)

The partial autocorrelation is important to detect the value of \(p\) in an AR(p) model because it has a particular signifying pattern.

For an AR model, the theoretical PACF should be zero past the order of the model. Put another way, the number of non-zero partial autocorrelations gives the order of the AR model, the value of \(p\).

Sample PACF for AR(p): Zero for lags > p

Simulated Data Examples

Let’s see this in action.

Sample ACF for AR(p): Decays to zero

Sample PACF for AR(p): Zero for lags > p

Play around with the values of the \(\ phi\)’s and the standard deviation to get a feel for an AR model and its associated ACF and PACF.

x <- arima.sim(n = 1000, list(ar = c(.7, -.3, .5)), sd = 2) #AR(3) model
plot(x)

acf2(x)

x <- arima.sim(n = 1000, list(ar = c(.8, -.4)), sd = 1) #AR(2) model
plot(x)

acf2(x)

Sample ACF for MA(q): Zero for lags > q

Sample PACF for MA(q): Decays to zero

Play around with the values of the \(\ theta\)’s and the standard deviation to get a feel for an MA model and its associated ACF and PACF.

x <- arima.sim(n = 1000, list(ma = c(.7, -.2, .8)), sd = 1) #MA(3) model
plot(x)

acf2(x)

x <- arima.sim(n = 1000, list(ma = c(.8, -.4)), sd = 1) #MA(2) model
plot(x)

acf2(x)

5.7.1 Model Selection

We have now learned about AR(\(p\)), MA(\(q\)), and ARMA(\(p\),\(q\)) models. When we have observed data, we first need to estimate and remove the trend and seasonality and then choose a stationary model to account for the dependence in the errors.

Below we have a few guidelines and tools to help you work through the modeling process.

5.7.1.1 Time Series Plot

Look for possible long-range trends, seasonality of different orders, outliers, constant or non-constant variance.
Model and remove the trend.
Model and remove the seasonality.
Investigate the outliers for possible human errors or find explanations for their extreme values.
If the variability around the trend/seasonality is non-constant, you could transform the series with a logarithm or square root or use a more complex model such as ARCH model.

5.7.1.2 ACF and PACF Plots

AR(\(p\)) models have PACF non-zero values at lags less than or equal to \(p\) and zero elsewhere. The ACF should decay or taper to zero in some fashion.
MA(\(q\)) models have theoretical ACF with non-zero values at lags less than or equal to \(q\). The PACF should decay or taper to zero in some fashion.
ARMA models (\(p\) and \(q\) > 0) have ACFs and PACFs that taper down to 0. These are the trickiest to determine because the order will not be obvious. Try fitting a model with low orders for one or both \(p\) and \(q\) and compare models.
If the ACF and PACF do not decay to zero but instead have values that stay close to 1 over many lags, the series is non-stationary (non-constant mean) and differencing or estimating to remove the trend will be needed. Try a first difference and look at the ACF and PACF of the difference data.
If all of the autocorrelations are non-significant (within the horizontal lines), then the series is white noise, and you don’t need to model anything. (Great!)

5.7.1.3 Diagnostics

Once we have a potential model that we are considering, we want to make sure that it fits the data well. Below are a few guidelines to check before you determine your final model.

Look at the significance of the coefficients. Are they significantly different from zero? If so, great. If not, they may not be necessary , and you should consider changing the order of the model.
Look at the ACF of the residuals. Ideally, our residuals should look like white noise.
Look at a time series plot of the residuals and check for any non-constant variance. If you have non-constant variance in the residuals, consider taking the original data’s natural log and analyzing the log instead of the raw data.
Look at the p-values for the Ljung-Box Test (see Real Data Example for example, R code and output). For this test, \(H_0:\) the data are independent, and \(H_A:\) data exhibit serial correlation (not independent). The test-statistic is

\[Q_h = n(n+2)\sum^h_{k=1}\frac{r_k^2}{n-k}\]

where \(h\) is the number of lags being tested simultaneously. If \(H_0\) is true, \(Q_h\) follows a Chi-squared distribution with df = \(h\) for very large \(n\). So you’d like to see large p-values for each lag \(h\) to not reject the \(H_0\).

5.7.1.4 Criteria to Choose

Which model to choose?

Choose the ‘best’ model with the fewest parameters
Pick the model with the lowest standard errors for predictions in the future
Compare models with respect to the MSE, AIC, and BIC (the lower the better for all)
If two models seem very similar when converted to an infinite order MA, they may be nearly the same. So it may not matter too much which ARMA model you end up with.