15  Reshape

using DataFrames

15.1 Long to wide: unstack()

Example 1:

dat_long = DataFrame(ID = repeat([1:3;], 3),
                     Fruit = ["mango", "mango", "mango", 
                              "banana", "banana", "banana", 
                              "tangerine", "tangerine", "tangerine"],
                     Score = [1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3])
9×3 DataFrame
Row ID Fruit Score
Int64 String Float64
1 1 mango 1.1
2 2 mango 2.1
3 3 mango 3.1
4 1 banana 1.2
5 2 banana 2.2
6 3 banana 3.2
7 1 tangerine 1.3
8 2 tangerine 2.3
9 3 tangerine 3.3

Reshape a DataFrame from wide to long format using unstack():

Usage: unstack(df, colkey, value)

unstack(dat_long, :Fruit, :Score)
3×4 DataFrame
Row ID mango banana tangerine
Int64 Float64? Float64? Float64?
1 1 1.1 1.2 1.3
2 2 2.1 2.2 2.3
3 3 3.1 3.2 3.3

Example 2:

dat_long = DataFrame(
    Account_ID = [8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004, 
            8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004], 
    Age = [67.8017038366664, 42.9198507293701, 46.2301756642422, 
            39.665983196671, 67.8017038366664, 42.9198507293701, 
            46.2301756642422, 39.665983196671, 67.8017038366664, 
            42.9198507293701, 46.2301756642422, 39.665983196671, 
            67.8017038366664, 42.9198507293701, 46.2301756642422, 
            39.665983196671], 
    Admission = ["ED", "Planned", "Planned", "ED", "ED", "Planned", 
            "Planned", "ED", "ED", "Planned", "Planned", "ED", "ED", "Planned", 
            "Planned", "ED"], 
    Lab_key = ["RBC", "RBC", "RBC", "RBC", "WBC", "WBC", "WBC", "WBC", 
            "Hematocrit", "Hematocrit", "Hematocrit", "Hematocrit", 
            "Hemoglobin", "Hemoglobin", "Hemoglobin", "Hemoglobin"], 
    Lab_value = [4.63449321082268, 3.34968550627897, 4.27037213597765, 
            4.93897736897793, 8374.22887757195, 7612.37380499927, 
            8759.27855519425, 6972.28096216548, 36.272693147236, 
            40.5716317809522, 39.9888624177955, 39.8786884058422, 
            12.6188444991545, 12.1739747363806, 15.1293426442183, 
            14.8885696185238]
)
16×5 DataFrame
Row Account_ID Age Admission Lab_key Lab_value
Int64 Float64 String String Float64
1 8001 67.8017 ED RBC 4.63449
2 8002 42.9199 Planned RBC 3.34969
3 8003 46.2302 Planned RBC 4.27037
4 8004 39.666 ED RBC 4.93898
5 8001 67.8017 ED WBC 8374.23
6 8002 42.9199 Planned WBC 7612.37
7 8003 46.2302 Planned WBC 8759.28
8 8004 39.666 ED WBC 6972.28
9 8001 67.8017 ED Hematocrit 36.2727
10 8002 42.9199 Planned Hematocrit 40.5716
11 8003 46.2302 Planned Hematocrit 39.9889
12 8004 39.666 ED Hematocrit 39.8787
13 8001 67.8017 ED Hemoglobin 12.6188
14 8002 42.9199 Planned Hemoglobin 12.174
15 8003 46.2302 Planned Hemoglobin 15.1293
16 8004 39.666 ED Hemoglobin 14.8886

Usage: unstack(df, rowky, colkey, value)

unstack(dat_long, [:Account_ID, :Age, :Admission], :Lab_key, :Lab_value)
4×7 DataFrame
Row Account_ID Age Admission RBC WBC Hematocrit Hemoglobin
Int64 Float64 String Float64? Float64? Float64? Float64?
1 8001 67.8017 ED 4.63449 8374.23 36.2727 12.6188
2 8002 42.9199 Planned 3.34969 7612.37 40.5716 12.174
3 8003 46.2302 Planned 4.27037 8759.28 39.9889 15.1293
4 8004 39.666 ED 4.93898 6972.28 39.8787 14.8886

15.2 Wide to long: stack()

Example 1:

dat_wide = DataFrame(ID = [1, 2, 3],
                     mango = [1.1, 2.1, 3.1],
                     banana = [1.2, 2.2, 3.2],
                     tangerine = [1.3, 2.3, 3.3])
3×4 DataFrame
Row ID mango banana tangerine
Int64 Float64 Float64 Float64
1 1 1.1 1.2 1.3
2 2 2.1 2.2 2.3
3 3 3.1 3.2 3.3

Usage: stack(df, [measure_vars], [id_vars])

If no measure_vars are defined, defaults to all floating point columns.

If no id_vars are defined, defaults to all columns not in measure_vars

dat_wide2long = stack(dat_wide)
9×3 DataFrame
Row ID variable value
Int64 String Float64
1 1 mango 1.1
2 2 mango 2.1
3 3 mango 3.1
4 1 banana 1.2
5 2 banana 2.2
6 3 banana 3.2
7 1 tangerine 1.3
8 2 tangerine 2.3
9 3 tangerine 3.3
stack(dat_wide, 2:4, 1)
9×3 DataFrame
Row ID variable value
Int64 String Float64
1 1 mango 1.1
2 2 mango 2.1
3 3 mango 3.1
4 1 banana 1.2
5 2 banana 2.2
6 3 banana 3.2
7 1 tangerine 1.3
8 2 tangerine 2.3
9 3 tangerine 3.3
stack(dat_wide, [:mango, :banana, :tangerine], :ID)
9×3 DataFrame
Row ID variable value
Int64 String Float64
1 1 mango 1.1
2 2 mango 2.1
3 3 mango 3.1
4 1 banana 1.2
5 2 banana 2.2
6 3 banana 3.2
7 1 tangerine 1.3
8 2 tangerine 2.3
9 3 tangerine 3.3

Example 2:

dat_wide = DataFrame(
    Account_ID = [8001, 8002, 8003, 8004], 
    Age = [67.8017038366664, 42.9198507293701, 46.2301756642422, 
        39.665983196671], 
    RBC = [4.63449321082268, 3.34968550627897, 4.27037213597765, 
        4.93897736897793], 
    WBC = [8374.22887757195, 7612.37380499927, 8759.27855519425, 
        6972.28096216548], 
    Hematocrit = [36.272693147236, 40.5716317809522, 39.9888624177955, 
        39.8786884058422], 
    Hemoglobin = [12.618444991545, 12.1739747363806, 15.1293426442183, 
        14.8885696185238], 
    Admission = ["ED", "Planned", "Planned", "Planned"]
)
4×7 DataFrame
Row Account_ID Age RBC WBC Hematocrit Hemoglobin Admission
Int64 Float64 Float64 Float64 Float64 Float64 String
1 8001 67.8017 4.63449 8374.23 36.2727 12.6184 ED
2 8002 42.9199 3.34969 7612.37 40.5716 12.174 Planned
3 8003 46.2302 4.27037 8759.28 39.9889 15.1293 Planned
4 8004 39.666 4.93898 6972.28 39.8787 14.8886 Planned

Usage: stack(df, measures_vars, id_vars; variable_name, value_name)

stack(dat_wide,
      [:RBC, :WBC, :Hematocrit, :Hemoglobin],
      [:Account_ID, :Age, :Admission],
      variable_name = :Lab, value_name = :Value)
16×5 DataFrame
Row Account_ID Age Admission Lab Value
Int64 Float64 String String Float64
1 8001 67.8017 ED RBC 4.63449
2 8002 42.9199 Planned RBC 3.34969
3 8003 46.2302 Planned RBC 4.27037
4 8004 39.666 Planned RBC 4.93898
5 8001 67.8017 ED WBC 8374.23
6 8002 42.9199 Planned WBC 7612.37
7 8003 46.2302 Planned WBC 8759.28
8 8004 39.666 Planned WBC 6972.28
9 8001 67.8017 ED Hematocrit 36.2727
10 8002 42.9199 Planned Hematocrit 40.5716
11 8003 46.2302 Planned Hematocrit 39.9889
12 8004 39.666 Planned Hematocrit 39.8787
13 8001 67.8017 ED Hemoglobin 12.6184
14 8002 42.9199 Planned Hemoglobin 12.174
15 8003 46.2302 Planned Hemoglobin 15.1293
16 8004 39.666 Planned Hemoglobin 14.8886