5.8 Real Data Example

In order to fit these models to real data, we first must get a detrended stationary series (removing seasonality as well). This can be done with estimation and removing or through differencing.

For the birth data, we tried a variety of ways of removing the trend and seasonality with differencing in the Model Components chapter. While the moving average filter worked well. Let’s try a fully parametric approach by using a spline to model the trend and indicators for the seasonality,

birthTS <- data.frame(Year = time(birth), #time() works on ts objects
  Month = cycle(birth), #cycle() works on ts objects
  Value = birth)
birthTS <- birthTS %>%
  mutate(Time = Year + (Month-1)/12)

TrendSeason <- birthTS %>%
  lm(Value ~ bs(Time, degree = 2, knots = c(1965, 1972), Boundary.knots = c(1948,1981)) + factor(Month), data = .)

birthTS <- birthTS %>%
  mutate(ResidualTS = Value - predict(TrendSeason))

The next step is to consider the ACF and PACF to see if you can recognize any clear patterns in dropping or decaying (MA or AR).

acf2(birthTS$ResidualTS) 

We note the ACF is decaying slowly to zero, and the PACF has 1 clear non-zero lag. These two graphs suggest an AR(1) model. Let’s fit an AR(1) model and a few other candidate models with sarima() in the astsa package.

mod.fit1 <- sarima(birthTS$ResidualTS, p = 1,d = 0,q = 0) #AR(1)
mod.fit1$fit
mod.fit2 <- sarima(birthTS$ResidualTS, p = 1,d = 0,q = 1) #ARMA(1,1)
mod.fit2$fit
mod.fit3 <- sarima(birthTS$ResidualTS, p = 2,d = 0,q = 0) #AR(2)
mod.fit3$fit
#Choose a model with lowest BIC
mod.fit1$BIC
mod.fit2$BIC
mod.fit3$BIC

sarima.for(birthTS$ResidualTS,10, p = 1,d = 0,q = 1) #forecasts the residuals 10 time units in the future for the ARMA(1,1) model (we'll get to forecasting in a second)

Now, you’ll notice that we went through the following steps:

  1. Model Trend and Remove
  2. Model Seasonality and Remove
  3. Model Errors

The forecasts look funny because they are only forecasts for errors. If we have taken a parametric approach to estimate the trend and the seasonality, we can incorporate the modeling and removal process in the sarima() function.

First, we need to create a matrix of the variables for the trend and seasonality. We can easily do that with model.matrix() and remove the intercept column with [,-1]. Then we pass that matrix to sarima() with the xreg= argument. This will be useful for us when we do forcasting.

X <- model.matrix(TrendSeason)[,-1] #remove the intercept

GoodMod <- sarima(birthTS$Value, p = 1,d = 0, q = 1, xreg = X)

Now, you should notice two things from this output:

  1. There is a slightly higher ACF of residuals at lag 1.0 (at 12 months or 1 year – which is the period of the seasonality)

  2. The p-values for the Ljung-Box statistic are all < 0.05. That is not ideal because it suggests that the residuals of the full model are not independent white noise. There is still dependence that we still need to account for with the model.

We’ll come back to fix this. Before we do that, let’s talk about the function being called sarima() instead of arma(). What does the i' stand for, and what does thes` stand for? Read the next section to find out!