Chapter 1 Introduction

Greetings!

This course is about three types of correlated data. In most of the statistical methods and models you’ve learned in past classes, you assumed that observations are independently drawn from a population through random sampling or independently generated from a random process. For this to be true, the observed value of a randomly chosen unit or subject cannot influence or be systematically related to the observed value of another unit or subject.

There are many circumstances in which the independence assumption is not valid or realistic to assume.

  • If you collect data on biological siblings, the children with similar genetics and home environment will be more similar to each other than randomly selected children.

  • Educational data collected in schools is not independent. Students in the same classroom will be more similar in their learning than students from different classrooms because they have a common teacher and curriculum.

  • Data collected on the same individuals over time is correlated; the repeated measurements on an individual will be more similar than measurements across individuals.

One of the learning goals of this course is to understand the consequences of incorrectly making the independence assumption in a statistical model and the potential impact it has on our conclusions.

We’ll also learn about appropriate statistical models and methods for analyzing data generated with natural dependencies and correlation.