6.1 Sources of Variation
We now consider why the values of a characteristic that we might observe vary over time.
Biological variation: It is well-known that biological entities are different. However, living things of the same type tend to be similar in their characteristics; they are not the same (except perhaps in the case of genetically-identical clones). Thus, even if we focus on rats of the same genetic strain, age, and gender, we expect variation in the possible weights of such rats that we might observe due to inherent, natural biological variation. This is variation due to genetics, environmental, and behavioral factors. Thus, this variation is at the unit-level, so we see between unit variation.
Variation due to Condition or Time : Entities change over time and under different conditions. Suppose we consider rats over time or under various dietary conditions. In that case, we expect variation in the possible weights of such rats that we might observe due to variation due to condition or time. Thus, this variation is at the observation-level so we see within unit variation.
Measurement error: We have discussed rat weight as though once we have a rat in hand, we may know its weight exactly. However, a scale must be used. Ideally, a scale should register the true weight of an item each time it is weighed, but because such devices are imperfect, scale measurements on the same item may vary time after time. The amount by which the measurement differs from the truth may be considered an error, i.e., a deviation up or down from the true value that could be observed with a perfect device. A fair or unbiased device does not systematically register high or low most of the time; rather, the errors may go in either direction with no pattern. Thus, this variation is typically at the observation-level, so we see within unit variation.
There are still further sources of variation that we could consider. They could be at the unit-level or the observational-level. For now, the important message is that, in considering statistical models, it is critical to be aware of different sources of variation that cause observations to vary.
With cross-sectional data (one observed point in time in units sampled from across the population - no repeated measures), we
- cannot distinguish between the types of variation
- use explanatory variables to try and explain any sources of variation (biological and other)
- use model error as a catch-all to account for any leftover variation (measurement or other)
With longitudinal data, since we have repeated measurements on units, we
- separate variation within units (measurement and others) from variation between units (biological and others)
- use time-varying explanatory variables to try and explain within unit variation
- use time-invariant explanatory variables to try and explain between unit variation
- use probability models to account for left over within or between variation