4.1 Trend

Let’s start by focusing on the trend component. Our goal here is to estimate the overall outcome pattern over time.

We’ll plot the residuals (what is leftover) over time to see if we’ve successfully estimated and removed the trend. We’ll see if the average residual is fairly constant across time. If not, we must try a different approach to model/remove the trend.

4.1.1 Parametric Approaches

4.1.1.1 Global Transformations

If the trend, \(f(x_t)\), is linearly increasing or decreasing in time (with no seasonality), then we could use a linear regression model to estimate the trend with the following model,

\[Y_t = \beta_0 + \beta_1x_t + \epsilon_t\] where \(x_t\) is some measure of time (e.g., age, time from baseline, etc.).

If the overall mean trend is quadratic, we could include a \(x_t^2\) term in the regression model,

\[Y_t = \beta_0 + \beta_1x_t+ \beta_2x_t^2 + \epsilon_t\]

In fact, any standard regression technique for non-linear relationships (polynomials, splines, etc.) could also be used here to model the trend. As discussed in Chapter 1, we can use ordinary least squares (OLS), lm() in R, to get an unbiased estimate of these coefficients in linear regression. But remember, the standard errors and subsequent inference (p-values, confidence intervals) may not be valid due to correlated noise.

Let’s look at some example data and see how we can model the trend.

Data Example 1

Below are the quarterly earnings per share of the company Johnson and Johnson from 1960-1980. The first observation is the 1st quarter of 1960 (\(t=1\)) and the second is the 2nd quarter of 1960 (\(t=2\))

library(astsa) # R package for time series
jj
class(jj) # ts stands for time series R class
plot(jj) # plot() is smart enough to make the right x-axis because jj is ts object

We see that the earnings are overall increasing over time, but a bit faster than a linear relationship (there is some curvature in the trend). Let’s transform this dataset into a data frame to use ggplot2, an alternative graphing package in R.

library(ggplot2)
library(dplyr)

jjTS <- data.frame(
  Value = as.numeric(jj), 
  Time = time(jj), # time() works on ts objects
  Quarter = cycle(jj)) # cycle() works on ts objects

quadTrendMod <- lm(Value ~ poly(Time, degree = 2, raw = TRUE), data = jjTS) # poly() fits a polynomial trend model

jjTS <- jjTS %>%
  mutate(Trend = predict(quadTrendMod)) # Add the estimated trend to the dataset

jjTS %>%
  ggplot(aes(x = Time, y = Value)) +
  geom_point() +
  geom_line() +
  geom_line(aes(x = Time, y = Trend), color = 'blue', size = 1.5) +
  theme_classic()

Data Example 2

Below are the monthly counts (in thousands) of live births in the United States from 1948 to 1979. The first measurement is from January 1948 (\(t=1\)), then February 1948 (\(t=2\)), etc.

birth
class(birth) # ts stands for time series R class
plot(birth) # plot() is smart enough to make the right x-axis because birth is ts object

Looking at the plot (time-ordered points are connected with a line segment), we note a positive linear relationship from about 1948 to 1960 and a somewhat negative relationship between 1960 to 1968, with another slight increase and then a drop around 1972. This is not a polynomial trend over time. We’ll need another approach!

Let’s also transform this dataset into a data frame to use ggplot2.

birthTS <- data.frame(
  Year = floor(as.numeric(time(birth))), # time() works on ts objects
  Month = as.numeric(cycle(birth)), # cycle() works on ts objects
  Value = as.numeric(birth))

birthTS <- birthTS %>%
  mutate(DecimalTime = Year + (Month-1)/12)

birthTS %>% 
  ggplot(aes(x = DecimalTime, y = Value)) + 
  geom_line() + 
  theme_classic()

4.1.1.2 Splines

Let’s try a regression basis spline, which is a smooth piecewise polynomial. This means we break up the x-axis into intervals using knots or breakpoints and model a different polynomial in each interval. In particular, let’s fit a quadratic polynomial with knots/breaks at 1965 and 1972.

library(splines)
  
TrendSpline <- lm(Value ~ bs(DecimalTime, degree = 2, knots = c(1965,1972)), data = birthTS)
summary(TrendSpline)

birthTS <- birthTS %>%
  mutate(SplineTrend = predict(TrendSpline))

birthTS %>%
  ggplot(aes(x = DecimalTime, y = Value)) +
  geom_point() +
  geom_line() +
  geom_line(aes(x = DecimalTime, y = SplineTrend), color = 'blue', size = 1.5) + 
  theme_classic()

Then let’s see what is left over by plotting the residuals after removing the estimated trend. Is the mean constant? If so, we can ensure we’ve estimated and removed the long-range trend.

# Plotting residuals
birthTS %>%
  ggplot(aes(x = DecimalTime, y = resid(TrendSpline))) + 
  geom_line() + 
  geom_smooth(se = FALSE) + 
  theme_classic()

These parametric approaches give us an estimated function to plug in times and get a predicted trend value, but they might not quite pick up the full trend.

What are alternative methods for estimating non-linear trends?

4.1.2 Nonparametric Approaches

Nonparametric methods for estimating a trend use algorithms (steps of a process) rather than parametric functions (linear, polynomial, splines) with a finite number of parameters to estimate.

4.1.2.1 Local Regression

Have you ever wondered how geom_smooth() estimates that blue smooth curve?

If you look at the documentation for geom_smooth(), you’ll see that it uses the loess() function for smaller data sets. This trend estimation method is locally estimated scatterplot smoothing, also called local regression.

The loess algorithm is to predict the outcome (estimate the trend) at a point on the x-axis, \(x_0\), following the steps below:

  • Distance: calculate the distance between \(x_0\) and each observed points \(x\), \(d(x,x_0)=|x-x_0|\).
  • Find Local Area: keep the closest \(s\) proportion of observed data points (\(s\) determined by span parameter)
  • Fit Model: fit a linear or quadratic regression model to the local area, weighting points close to \(x_0\) higher (weighting determined by a chosen weighting kernel function)
  • Predict: use that local model to predict the outcome at \(x_0\)

That process is repeated for a sequence of values along the x-axis to make a plot like the one below (blue: loess, purple: spline).

birthTS %>%
  ggplot(aes(x = DecimalTime, y = Value)) +
  geom_point() +
  geom_line() +
  geom_smooth(se = FALSE, color = 'blue') + # uses loess
  geom_line(aes(x = DecimalTime, y = SplineTrend), color = 'purple', size = 1.5) + 
  theme_classic()

We can also get the estimated trend with the loess() function.

# Running loess separately
LocalReg <- loess(Value ~ DecimalTime, data = birthTS, span = .75)
birthTS <- birthTS %>%
  mutate(LoessTrend = predict(LocalReg))

birthTS %>%
  ggplot(aes(x = DecimalTime, y = Value)) +
  geom_point() +
  geom_line() +
  geom_line(aes(x = DecimalTime, y = LoessTrend), color = 'blue', size = 1.5) + 
  geom_line(aes(x = DecimalTime, y = SplineTrend), color = 'purple', size = 1.5) + 
  theme_classic()

# Plotting residuals
birthTS %>%
  ggplot(aes(x = DecimalTime, y = resid(LocalReg))) + 
  geom_line() + 
  geom_smooth(se = FALSE) + 
  theme_classic()

4.1.2.2 Moving Average Filter

Local regression is an extension of a simple idea called a moving average filter.

Imagine a window that only shows you what is “local” or right in front of you, and we can adjust the window width to include more or less data. In the illustration below, we have drawn a 2-year window around January 1960. When considering information around Jan 1960, the window limits our view to only those one year before and one year after (from Jan 1959-Jan 1961).

For our circumstances, the “local” window will include an odd number of observations (number of points = \(1+2a\) for a given \(a\) value). Among that small local group of observations, we take the average to estimate the mean at the center of those points (rather than fit a linear regression model),

\[\hat{f}(x_t) = (1+2a)^{-1}\sum^a_{k=-a}y_{t+k}\]

Then, we move the window and calculate the mean within that window. Continue until you have a smooth curve estimate for each point in the time range of interest.

Box <- data.frame(x = 1965, y = 325, w = 2) #2-year window around 1980
Mean <- birthTS %>%
  filter(DecimalTime < 1965 + 1 & DecimalTime > 1965 - 1) %>%
  summarize(M = mean(Value)) %>% #Mean value of points in the window
  mutate(x = 1965)

birthTS %>% 
  ggplot(aes(x = DecimalTime, y = Value)) + 
  geom_line() + 
  geom_smooth(se = FALSE) + 
  geom_tile(data = Box, aes(x = x,y=y,width=w),height = 100, fill=alpha("grey",0), colour = 'yellow',size = 1) +
  geom_point(data = Mean, aes(x = x, y = M), colour = 'yellow', size = 2)+
  theme_classic()

This method works really well for windows of odd-numbered length (equal number of points around the center), but it can be adjusted if you desire an even-number length window by adding 1/2 of the two extreme lags so that the middle value lines up with time \(t\).

For the above monthly data example, we might want a 12-point moving average, so we would have

\[\hat{f}(x_t) = \frac{0.5y_{t-6}+y_{t-5}+y_{t-4}+y_{t-3}+y_{t-2}+y_{t-1}+y_{t}+y_{t+1}+y_{t+2}+y_{t+3}+y_{t+4}+y_{t+5}+0.5y_{t+6} }{12}\]

fltr <- c(0.5, rep(1, 11), 0.5)/12 # weights for the weighted average above
birthTrend <- stats::filter(birth, filter = fltr, method = 'convo', sides = 2) #use the ts object rather than the data frame
plot(birthTrend)

Now if we increase the window’s width (24-point moving window), you should see a smoother curve.

\[\hat{f}(x_t) = \frac{0.5y_{t-12}+ y_{t-11}+...+y_{t-1}+y_{t}+y_{t+1}+...+y_{t+11}+0.5y_{t+12} }{24}\]

fltr <- c(0.5, rep(1, 23), 0.5)/24 # weights for moving average
birthTrend <- stats::filter(birth, filter = fltr, method = 'convo', sides = 2) # use the ts object rather than the data frame
plot(birthTrend)

What if we did a 5-point moving average instead of a 12 or 24-point moving average?

fltr <- rep(1, 5)/5
birthTrend2 <- stats::filter(birth, filter = fltr, method = 'convo', sides = 2)
plot(birthTrend2)

We see the seasonality (repeating cycles) because the 5-point window averages about half of the year’s data for each estimate. Thus the highs and lows don’t balance each other out.

You must be mindful of the window width for the moving average, depending on the context of the data and the existing cycles in the data.

Let’s see what is leftover. Is the mean of the residuals constant across time (indicating that we have fully removed the trend)?

birthTS <- birthTS %>% 
  mutate(Trend = birthTrend) %>%
  mutate(Residual = Value - Trend)

birthTS %>%
  ggplot(aes(x = DecimalTime, y = Residual)) + 
  geom_point() +
  geom_smooth(se = FALSE) + 
  theme_classic()

If we connect the points, we might see a cyclic pattern…

birthTS %>%
  ggplot(aes(x = DecimalTime, y = Residual)) + 
  geom_line() +
  geom_smooth(se = FALSE) + 
  theme_classic()

We still have to deal with seasonality (we’ll get there), but the moving average does a fairly good job at removing the trend because we see, that the trend of the residuals is 0 on average!

The moving average filter did a decent job of estimating the trend because of its flexibility, but we can’t write down a function statement for that trend. This is the downside of nonparametric methods.

4.1.2.3 Differencing to Remove Trend

If we don’t necessarily care about estimating the trend for prediction, we could use differencing to explicitly remove a trend.

We’ll talk about first and second-order differences, which are applicable if we have a linear or quadratic trend.

Linear Trend - First Difference

If we have a linear trend such that \(Y_t = b_0 + b_1 t + \epsilon_t\), the difference between neighboring outcomes (1 time unit apart) will essentially remove that trend.

Take the outcome at time \(t\) and an outcome that lags behind one time unit:

\[Y_t - Y_{t-1} = (b_0 + b_1 t + \epsilon_t) - (b_0 + b_1 (t-1) + \epsilon_{t-1})\] \[= b_0 + b_1 t + \epsilon_t - b_0 - b_1t + b_1 - \epsilon_{t-1}\] \[= b_1 + \epsilon_t - \epsilon_{t-1}\] which is constant with respect to time, \(t\).

See below for a simulated example.

t <- 1:36
y <- ts(2 + 5*t + rnorm(36,sd = 10), frequency=12)
plot(y)

plot(diff(y, lag = 1, differences = 1))

Quadratic Trend - Second Order Differences

If we have a quadratic relationship such that \(Y_t = b_0 + b_1 t+ b_2 t^2 + \epsilon_t\), then taking the second order difference between neighboring outcomes will remove that trend.

Take the outcome at time \(t\) and an outcome that lags behind one time unit (first difference): \[(Y_t - Y_{t-1}) = (b_0 + b_1 t+ b_2 t^2 + \epsilon_t) - (b_0 + b_1 (t-1)+ b_2 (t-1)^2 + \epsilon_{t-1})\] \[= b_0 + b_1 t + b_2 t^2 + \epsilon_t - b_0 - b_1 x_t + b_1 - b_2 t^2 + 2b_2t - b_2^2 - \epsilon_{t-1}\] \[= b_1- b_2^2 + 2b_2t + \epsilon_t - \epsilon_{t-1}\] and thus, the second-order difference is the differences in the neighboring first differences:

\[(Y_t - Y_{t-1}) - (Y_{t-1} - Y_{t-2}) = (b_1- b_2^2 + 2b_2t + \epsilon_t - \epsilon_{t-1}) - (b_1- b_2^2 + 2b_2(t-1) + \epsilon_{t-1} - \epsilon_{t-2})\] \[= 2b_2 + \epsilon_t - 2\epsilon_{t-1} + \epsilon_{t-2}\] which is constant with respect to time, \(t\).

See below for an example of simulated data.

t <- 1:36
y <- ts(2 + 1.5*t + .3*t^2 + rnorm(36,sd = 10), frequency=12)
plot(y)

plot(diff(y, lag = 1, differences = 1)) #first difference
plot(diff(y, lag = 1, differences = 2)) #second order difference

Let’s see what we get when we use differencing on the real birth count data. It looks like the first differences resulted in data with no overall trend despite the complex trend over time!

birthTS %>%
  mutate(Diff = c(NA, diff(Value, lag = 1, differences = 1))) %>%
  ggplot(aes(x = DecimalTime, y = Diff)) + 
  geom_point() + 
  geom_line() + 
  geom_hline(yintercept = 0) + 
  geom_smooth(se = FALSE) + 
  theme_classic()

4.1.3 In Practice: Estimate vs. Remove

We have a few methods to estimate the trend,

  1. Parametric Regression: Global Transformations (polynomials)
  2. Parametric Regression: Spline basis
  3. Nonparametric: Local Regression
  4. Nonparametric: Moving Average Filter

Among the estimation methods, two are parametric in that we estimate slope parameters or coefficients (regression techniques), and two are nonparametric in that we do not have any parameters to estimate but rather focus on the whole function.

Things to consider when choosing an estimation method:

  • How well does it approximate the true underlying trend? Does it systematically under or overestimate the trend for certain periods or are the residuals on average zero across time?
  • Do you want to use past trends to predict the future? If so, you may want to consider a parametric approach.

We have two approaches to removing the trend,

  1. Estimate, then Residuals. Estimate the trend first, then subtract trend from observations to get residuals, \(y_t - \hat{f}(x_t)\).
  2. Difference. Skip the estimating and try first or second-order differencing of neighboring observations.

Things to consider when choosing an approach for removing the trend:

  • How well does it remove the underlying trend? Are the residuals (what is leftover) close to zero on average, or are there still long-range patterns in the data?
  • Do you want to report on the estimated trend? Then estimate first.
  • If your main goal is prediction, try differencing.