Chapter 4 Model Components
As we introduced at the beginning of the course, most models we use for data be written in terms of model components of a trend plus noise or error,
\[Y_t = \underbrace{f(x_t)}_\text{trend} + \underbrace{\epsilon_t}_\text{noise}\]
The trend would generally give the average outcome as a function of \(x_t\), which could represent time/space and/or explanatory variables measured across time/space.
- In Stat 155 (Introduction to Statistical Modeling), we model the trend with linear combinations of explanatory variables, \(f(\mathbf{x}) = \beta_0+\beta_1x_1+\beta_2x_2+\cdots+\beta_px_p\) and we assumed the noise was independent.
- In Stat 253 (Statistical Machine Learning), we learn parametric and nonparametric tools to model the trend (polynomials, splines, local regression, KNN, trees/forests, etc.) for prediction.
We’ll build on your existing knowledge and learn a few more tools for modeling the trend and other components that arise with correlated data. We’ll use the notation of time series to discuss model components, but these ideas apply to longitudinal and spatial data.
Typically, we write a time series model with three components (not just two) and model them separately:
\[Y_t = \underbrace{f(x_t)}_\text{trend} + \underbrace{s(x_t)}_\text{seasonality} + \underbrace{\epsilon_t}_\text{noise}\]
Let’s define these model components and the types of questions that are driving the modeling.
Trend: This is the long-range average outcome pattern over time. We often ask ourselves, “Is the data generally increasing or decreasing over time? How is it changing? Linearly? Exponentially?”
Seasonality: This refers to cyclical patterns related to calendar time, such as seasons, quarters, months, days of the week, or time of day, etc. We might wonder, “Are there daily cycles, weekly cycles, monthly cycles, annual cycles, and/or multi-year cycles?” (e.g., amount of sunshine has a daily and annual cycle, the climate has multi-year El Nino and La Nina climate cycles on top of annual seasonality of temperature)
Noise: There is high-frequency variability in the outcome not attributed to trend or seasonality. In other words, noise is what is leftover. We might break up this noise into two noise components:
- serial correlation: the size and direction of the noise today is likely to be similar tomorrow
- independent measurement error: due to natural variation in the measurement device
We often ask, “Are there structures/patterns in the noise so that we could use probability to model the serial correlation? Is there a certain range of time or space in which we have dependence? Is the magnitude of the variability constant across time?”