6.3 Wide V. Long Format

In R, longitudinal data could be formatted in two ways.

Wide Format: When the observations times are the same for every unit (balanced data), then every unit has a vector of observations of length \(m\) and we can organize the unit’s data into one row per unit, resulting in a data set with \(n\) rows and \(m\) columns that correspond to outcome values (plus other columns that will correspond to an identifier variable and other explanatory variables). See the simulated example below. The first column is an id column to identify the units from each other, the next five columns correspond to the \(m=5\) observations over time and the next five columns correspond to a time-varying explanatory variable.

n = 10
m = 5

(wideD = data.frame(id = 1:n, y.1 = rnorm(n), y.2 = rnorm(n), y.3 = rnorm(n), y.4 = rnorm(n), y.5 = rnorm(n),x.1 = rnorm(n), x.2 = rnorm(n), x.3 = rnorm(n), x.4 = rnorm(n), x.5 = rnorm(n))) #Example with 5 repeated observations for 10 units
##    id     y.1    y.2     y.3    y.4
## 1   1 -0.5022 -0.484 -0.0437  0.758
## 2   2  1.0911 -0.469 -1.9647  0.360
## 3   3  1.5260  0.178 -0.6704  0.888
## 4   4  0.0872 -1.028  0.3555 -2.244
## 5   5  1.6115  1.719 -0.2880 -1.171
## 6   6 -0.8922 -0.153  0.3542 -0.270
## 7   7 -1.2615  0.604 -2.0888  1.043
## 8   8 -1.1297  0.330 -1.6458 -0.720
## 9   9  1.2011 -0.527  2.2073 -0.408
## 10 10 -0.1410 -1.238 -0.3731  0.712
##       y.5    x.1     x.2    x.3     x.4
## 1  -1.445  1.532 -0.0167  0.903  1.3133
## 2  -0.468  0.696 -0.9360 -0.681  0.6839
## 3   0.436  1.483 -0.7859 -0.122 -0.9699
## 4  -1.545 -0.119  0.3324 -0.453 -0.3007
## 5   1.726  0.142  0.8997  1.579 -1.5298
## 6   1.159  0.808  1.3064  2.495 -0.0425
## 7  -0.769  0.596 -0.3372  0.517 -0.3297
## 8   2.813  1.062  1.2227  0.775 -0.2276
## 9  -0.112 -0.143  1.2835 -0.185  1.0611
## 10 -0.956  0.979  0.7801  0.172  0.6207
##       x.5
## 1   0.429
## 2  -1.038
## 3  -0.400
## 4  -0.293
## 5   0.213
## 6  -1.330
## 7   1.377
## 8   2.142
## 9  -0.518
## 10  0.621

Long Format: In most analysis, we’ll need the data in long format. When observations times are not the same for every unit, then we have to use long format. Imagine stacking the each unit’s observed outcome vectors on top of each other. Similarly, we want to stack the observed vectors of any other explanatory variable we might have on the individuals. In contrast to wide format, it is absolutely necessary to have a variable to identify the unit as well as the observation time.

See below for R code to convert a data set in wide format to one in long format. Hopefully the variable names given in a way that clearly specifies the variable name and the time order of the values such as y.1,y.2,...y.5,x.1,x.2,...,x.5.

In the tidyr package, gather() takes the wide data and you provide it the names of two variables that you want to create. The first (called Tmp here) is the variable name that will contain the variable names of the columns you want to gather. The second (called Value) is the variable name that will collect the values from those columns. See below.

require(tidyr)
gather(wideD, Tmp, Value, y.1:x.5)
##     id Tmp   Value
## 1    1 y.1 -0.5022
## 2    2 y.1  1.0911
## 3    3 y.1  1.5260
## 4    4 y.1  0.0872
## 5    5 y.1  1.6115
## 6    6 y.1 -0.8922
## 7    7 y.1 -1.2615
## 8    8 y.1 -1.1297
## 9    9 y.1  1.2011
## 10  10 y.1 -0.1410
## 11   1 y.2 -0.4842
## 12   2 y.2 -0.4691
## 13   3 y.2  0.1783
## 14   4 y.2 -1.0278
## 15   5 y.2  1.7189
## 16   6 y.2 -0.1533
## 17   7 y.2  0.6043
## 18   8 y.2  0.3300
## 19   9 y.2 -0.5272
## 20  10 y.2 -1.2381
## 21   1 y.3 -0.0437
## 22   2 y.3 -1.9647
## 23   3 y.3 -0.6704
## 24   4 y.3  0.3555
## 25   5 y.3 -0.2880
## 26   6 y.3  0.3542
## 27   7 y.3 -2.0888
## 28   8 y.3 -1.6458
## 29   9 y.3  2.2073
## 30  10 y.3 -0.3731
## 31   1 y.4  0.7578
## 32   2 y.4  0.3600
## 33   3 y.4  0.8882
## 34   4 y.4 -2.2439
## 35   5 y.4 -1.1712
## 36   6 y.4 -0.2701
## 37   7 y.4  1.0433
## 38   8 y.4 -0.7198
## 39   9 y.4 -0.4083
## 40  10 y.4  0.7120
## 41   1 y.5 -1.4448
## 42   2 y.5 -0.4682
## 43   3 y.5  0.4358
## 44   4 y.5 -1.5454
## 45   5 y.5  1.7256
## 46   6 y.5  1.1586
## 47   7 y.5 -0.7691
## 48   8 y.5  2.8134
## 49   9 y.5 -0.1120
## 50  10 y.5 -0.9563
## 51   1 x.1  1.5318
## 52   2 x.1  0.6961
## 53   3 x.1  1.4828
## 54   4 x.1 -0.1187
## 55   5 x.1  0.1416
## 56   6 x.1  0.8082
## 57   7 x.1  0.5960
## 58   8 x.1  1.0624
## 59   9 x.1 -0.1427
## 60  10 x.1  0.9792
## 61   1 x.2 -0.0167
## 62   2 x.2 -0.9360
## 63   3 x.2 -0.7859
## 64   4 x.2  0.3324
## 65   5 x.2  0.8997
## 66   6 x.2  1.3064
## 67   7 x.2 -0.3372
## 68   8 x.2  1.2227
## 69   9 x.2  1.2835
## 70  10 x.2  0.7801
## 71   1 x.3  0.9025
## 72   2 x.3 -0.6810
## 73   3 x.3 -0.1220
## 74   4 x.3 -0.4535
## 75   5 x.3  1.5794
## 76   6 x.3  2.4954
## 77   7 x.3  0.5168
## 78   8 x.3  0.7751
## 79   9 x.3 -0.1853
## 80  10 x.3  0.1718
## 81   1 x.4  1.3133
## 82   2 x.4  0.6839
## 83   3 x.4 -0.9699
## 84   4 x.4 -0.3007
## 85   5 x.4 -1.5298
## 86   6 x.4 -0.0425
## 87   7 x.4 -0.3297
## 88   8 x.4 -0.2276
## 89   9 x.4  1.0611
## 90  10 x.4  0.6207
## 91   1 x.5  0.4293
## 92   2 x.5 -1.0376
## 93   3 x.5 -0.3995
## 94   4 x.5 -0.2932
## 95   5 x.5  0.2131
## 96   6 x.5 -1.3296
## 97   7 x.5  1.3766
## 98   8 x.5  2.1422
## 99   9 x.5 -0.5184
## 100 10 x.5  0.6215

Then, we want to separate the Tmp variable into two variables because they contain information about both the characteristic and time. We can use separate() to separate Tmp into Var and Time and it will automatically detect the . as a separator.

gather(wideD, Tmp, Value, y.1:x.5) %>%
  separate(Tmp,into=c('Var','Time'), remove=TRUE) 
##     id Var Time   Value
## 1    1   y    1 -0.5022
## 2    2   y    1  1.0911
## 3    3   y    1  1.5260
## 4    4   y    1  0.0872
## 5    5   y    1  1.6115
## 6    6   y    1 -0.8922
## 7    7   y    1 -1.2615
## 8    8   y    1 -1.1297
## 9    9   y    1  1.2011
## 10  10   y    1 -0.1410
## 11   1   y    2 -0.4842
## 12   2   y    2 -0.4691
## 13   3   y    2  0.1783
## 14   4   y    2 -1.0278
## 15   5   y    2  1.7189
## 16   6   y    2 -0.1533
## 17   7   y    2  0.6043
## 18   8   y    2  0.3300
## 19   9   y    2 -0.5272
## 20  10   y    2 -1.2381
## 21   1   y    3 -0.0437
## 22   2   y    3 -1.9647
## 23   3   y    3 -0.6704
## 24   4   y    3  0.3555
## 25   5   y    3 -0.2880
## 26   6   y    3  0.3542
## 27   7   y    3 -2.0888
## 28   8   y    3 -1.6458
## 29   9   y    3  2.2073
## 30  10   y    3 -0.3731
## 31   1   y    4  0.7578
## 32   2   y    4  0.3600
## 33   3   y    4  0.8882
## 34   4   y    4 -2.2439
## 35   5   y    4 -1.1712
## 36   6   y    4 -0.2701
## 37   7   y    4  1.0433
## 38   8   y    4 -0.7198
## 39   9   y    4 -0.4083
## 40  10   y    4  0.7120
## 41   1   y    5 -1.4448
## 42   2   y    5 -0.4682
## 43   3   y    5  0.4358
## 44   4   y    5 -1.5454
## 45   5   y    5  1.7256
## 46   6   y    5  1.1586
## 47   7   y    5 -0.7691
## 48   8   y    5  2.8134
## 49   9   y    5 -0.1120
## 50  10   y    5 -0.9563
## 51   1   x    1  1.5318
## 52   2   x    1  0.6961
## 53   3   x    1  1.4828
## 54   4   x    1 -0.1187
## 55   5   x    1  0.1416
## 56   6   x    1  0.8082
## 57   7   x    1  0.5960
## 58   8   x    1  1.0624
## 59   9   x    1 -0.1427
## 60  10   x    1  0.9792
## 61   1   x    2 -0.0167
## 62   2   x    2 -0.9360
## 63   3   x    2 -0.7859
## 64   4   x    2  0.3324
## 65   5   x    2  0.8997
## 66   6   x    2  1.3064
## 67   7   x    2 -0.3372
## 68   8   x    2  1.2227
## 69   9   x    2  1.2835
## 70  10   x    2  0.7801
## 71   1   x    3  0.9025
## 72   2   x    3 -0.6810
## 73   3   x    3 -0.1220
## 74   4   x    3 -0.4535
## 75   5   x    3  1.5794
## 76   6   x    3  2.4954
## 77   7   x    3  0.5168
## 78   8   x    3  0.7751
## 79   9   x    3 -0.1853
## 80  10   x    3  0.1718
## 81   1   x    4  1.3133
## 82   2   x    4  0.6839
## 83   3   x    4 -0.9699
## 84   4   x    4 -0.3007
## 85   5   x    4 -1.5298
## 86   6   x    4 -0.0425
## 87   7   x    4 -0.3297
## 88   8   x    4 -0.2276
## 89   9   x    4  1.0611
## 90  10   x    4  0.6207
## 91   1   x    5  0.4293
## 92   2   x    5 -1.0376
## 93   3   x    5 -0.3995
## 94   4   x    5 -0.2932
## 95   5   x    5  0.2131
## 96   6   x    5 -1.3296
## 97   7   x    5  1.3766
## 98   8   x    5  2.1422
## 99   9   x    5 -0.5184
## 100 10   x    5  0.6215

Lastly, we want to have one variable called x and one variable called y, and we can get that by spreading the variable Var into two columns with the values that come from Value.

gather(wideD, Tmp, Value, y.1:x.5) %>%
  separate(Tmp,into=c('Var','Time'), remove=TRUE) %>%
  spread(Var, Value)
##    id Time       x       y
## 1   1    1  1.5318 -0.5022
## 2   1    2 -0.0167 -0.4842
## 3   1    3  0.9025 -0.0437
## 4   1    4  1.3133  0.7578
## 5   1    5  0.4293 -1.4448
## 6   2    1  0.6961  1.0911
## 7   2    2 -0.9360 -0.4691
## 8   2    3 -0.6810 -1.9647
## 9   2    4  0.6839  0.3600
## 10  2    5 -1.0376 -0.4682
## 11  3    1  1.4828  1.5260
## 12  3    2 -0.7859  0.1783
## 13  3    3 -0.1220 -0.6704
## 14  3    4 -0.9699  0.8882
## 15  3    5 -0.3995  0.4358
## 16  4    1 -0.1187  0.0872
## 17  4    2  0.3324 -1.0278
## 18  4    3 -0.4535  0.3555
## 19  4    4 -0.3007 -2.2439
## 20  4    5 -0.2932 -1.5454
## 21  5    1  0.1416  1.6115
## 22  5    2  0.8997  1.7189
## 23  5    3  1.5794 -0.2880
## 24  5    4 -1.5298 -1.1712
## 25  5    5  0.2131  1.7256
## 26  6    1  0.8082 -0.8922
## 27  6    2  1.3064 -0.1533
## 28  6    3  2.4954  0.3542
## 29  6    4 -0.0425 -0.2701
## 30  6    5 -1.3296  1.1586
## 31  7    1  0.5960 -1.2615
## 32  7    2 -0.3372  0.6043
## 33  7    3  0.5168 -2.0888
## 34  7    4 -0.3297  1.0433
## 35  7    5  1.3766 -0.7691
## 36  8    1  1.0624 -1.1297
## 37  8    2  1.2227  0.3300
## 38  8    3  0.7751 -1.6458
## 39  8    4 -0.2276 -0.7198
## 40  8    5  2.1422  2.8134
## 41  9    1 -0.1427  1.2011
## 42  9    2  1.2835 -0.5272
## 43  9    3 -0.1853  2.2073
## 44  9    4  1.0611 -0.4083
## 45  9    5 -0.5184 -0.1120
## 46 10    1  0.9792 -0.1410
## 47 10    2  0.7801 -1.2381
## 48 10    3  0.1718 -0.3731
## 49 10    4  0.6207  0.7120
## 50 10    5  0.6215 -0.9563