5.10 Forecasting

Typically, when we are working with time series, we want to make predictions for the near future based on recently observed data.

Let’s think about an example. Imagine we had data \(y_1,...,y_n\) and we estimated the mean and seasonality such that \(e_t = y_t - \underbrace{\hat{f}(t|y_1,...,y_n)}_\text{trend} - \underbrace{\hat{S}(t|y_1,...,y_n)}_\text{seasonality}\). Thus, if we knew or had a prediction of \(e_t\), we could get a prediction of \(y_t = \underbrace{\hat{f}(t|y_1,...,y_n)}_\text{trend} + \underbrace{\hat{S}(t|y_1,...,y_n)}_\text{seasonality} + e_t\).

Let’s say we modeled \(e_t\) with an AR(2) process, then

\[e_t = \phi_1e_{t-1} + \phi_2e_{t-2} + W_t\quad\quad W_t\stackrel{iid}{\sim} N(0,\sigma^2_w)\]

Since we have data for \(y_1,...,y_n\), we have data for \(e_1,...,e_n\) to use to estimate \(\phi_1\), \(\phi_2\), and \(\sigma^2_w\).

To get a one-step ahead prediction based on the \(n\) observed data points, \(e_{n+1}^n\), we could plug into our model equation,

\[e_{n+1} = \phi_1e_{n} + \phi_2e_{n-1} + W_{n+1}\quad\quad W_t\stackrel{iid}{\sim} N(0,\sigma^2_w)\] We do have the values of \(e_n\) and \(e_{n-1}\) plus estimates of \(\phi_1\) and \(\phi_2\). But we don’t know \(w_{n+1}\). However, we know that our white noise is about 0 on average. So, our prediction one step ahead based on \(n\) observed data points is

\[\hat{e}^n_{n+1} = \hat{\phi_1}e_{n} + \hat{\phi_2}e_{n-1}\]

What about two steps ahead? If we plug it into our model,

\[e_{n+2} = \phi_1e_{n+1} + \phi_2e_{n} + W_{n+2}\quad\quad W_t\stackrel{iid}{\sim} N(0,\sigma^2_w)\]

we see that we do have the value of \(e_n\) plus estimates of \(\phi_1\) and \(\phi_2\). But we don’t know \(e_{n+1}\) and \(W_{n+2}\). Similar to before, we can assume the white noise is, on average, 0, and we can plug in our predictions. So, our two-step ahead prediction based on \(n\) data points is

\[\hat{e}^n_{n+2} = \hat{\phi_1}\hat{e}_{n+1}^n + \hat{\phi_2}e_{n}\]

We could continue this process,

\[\hat{e}^n_{n+m} = \hat{\phi_1}\hat{e}_{n+m-1}^n + \hat{\phi_2}\hat{e}_{n+m-2}^n\]

for \(m>2\). In general, to do forecasting with an ARMA model, for \(1\leq j\leq n\) we use residuals for \(w_j\) and let \(w_j = 0\) for \(j>n\). Similarly, for \(1\leq j\leq n\), we use observed data for \(e_j\) and plug in the forecasts \(\hat{e}_j^n\) for \(j>n\).

5.10.1 Prediction Intervals

We know that our predictions will be off from the truth. But how far off?

Before we can estimate the standard deviation of our errors, we need to know one more thing about ARMA models.

5.10.1.1 ARMA to MA(\(\infty\))

With a stationary ARMA model, we can write it as an infinite MA process (similar to the AR to infinite MA process),

\[Y_t = \sum^\infty_{j=0} \Psi_j W_{t-j}\]

where \(\Psi_0 = 1\) and \(\sum^\infty_{j=1}|\Psi_j|<\infty\).

The R command ARMAtoMA(ar = 0.6, ma = 0, 12) gives the first 12 values of \(\Psi_j\) for an AR(1) model with \(\phi_1 = 0.6\).

ARMAtoMA(ar = 0.6, ma = 0, 12)

The R command ARMAtoMA(ar = 0.6, ma = 0.1, 12) gives the first 12 values of \(\Psi_j\) for an ARMA(1,1) model with \(\phi_1 = 0.6\) and \(\theta_1 = 0.1\).

ARMAtoMA(ar = 0.6, ma = 0.1, 12)

5.10.1.2 Standard Errors of ARMA Errors

In order to create a prediction interval for \(\hat{y}^n_{n+m}\), we need to know how big the error, \(y^n_{n+m}-y_{n+m}\), may be.

It can be shown that

\[Var(\hat{y}^n_{n+m}-y_{n+m}) = \sigma^2_w \sum^{m-1}_{j=0} \Psi^2_j\] and thus, the standard errors are

\[SE(\hat{y}^n_{n+m}-y_{n+m}) = \sqrt{\hat{\sigma}^2_w \sum^{m-1}_{j=0} \hat{\Psi}^2_j}\]

where \(\hat{\sigma^2_w}\) and \(\hat{\Psi}_j\) are given by the estimation process.

Let’s see this play out with our birth data. We can estimate the ARMA model with sarima() and pull out the estimates of the sigma and the \(\ Psi_j\)’s. Compare our hand calculations to the standard errors from arima.for().

sarima(birthTS$ResidualTS, p = 1,0,q = 1)

psi = c(1,ARMAtoMA(ar = .8727,ma = -0.2469, 9))
sigma2 = 43.5

sarima.for(birthTS$ResidualTS, n.ahead=10, p = 1, d = 0, q = 1)$se
sqrt(sigma2*cumsum(psi^2)) #The SE for the predictions are the same

But if we incorporate the differencing for the trend/seasonality into the model, we’ll get forecasts in the original units of \(y_t\) instead of just \(Y_t\).

sarima.for(birth, n.ahead = 24, p = 0, d = 1, q = 1, P = 0, D = 1, Q = 1, S = 12)

We could also include explanatory variables into this model to directly estimate the trend and the seasonality with xreg and newxreg. You’ll use model.matrix()[,-1] to get the model matrix of explanatory variables to incorporate into xreg and create a version of that matrix with future values for newxreg.

Note about B-Splines: To use a spline to predict in the future, you must adjust the Boundary.knots to extend past where you want to predict for xreg and newxreg.

Below I create a matrix of explanatory variables called X that includes the B-spline (note I extended the Boundary.knots to extend to the period I want to predict) and indicators for Month. I do this with model.matrix()[,-1], removing the first column for the intercept. Then I create a matrix of those explanatory variables for times in the future, NewX. Based on this data set, I took the last 2 years and added 2 years to get the times for the next two years.

# One Model for Trend (B-spline - note how the Boundary Knots extend beyond the model and prediction time) and Seasonality (for Month)
TrendSeason <- birthTS %>%
  lm(Value ~ bs(Time, degree = 2, knots = c(1965, 1972), Boundary.knots = c(1948,1981)) + factor(Month), data = .)
X = model.matrix(TrendSeason)[,-1]

birthTS[nrow(birthTS),c('Time','Month')] #Last time observation

newdat = data.frame(Month = c(2:12,1:12,1), Time = 1979 + (1:24)/12 )

NewX = model.matrix(~ bs(Time, degree = 2, knots = c(1965,1972), Boundary.knots = c(1948,1981)) + factor(Month), data = newdat)[,-1] 

sarima.for(birthTS$Value, n.ahead = 24, p = 0, d = 0, q = 1, P = 0, D = 0, Q = 1, S = 12, xreg = X, newxreg = NewX)