23 Reshape

import polars as pl

23.1 Long to wide

Example 1:

dat_long = pl.DataFrame({"ID":list(range(1,4))*3,
                         "Fruit":["mango", "mango", "mango", 
                                  "banana", "banana", "banana", 
                                  "tangerine", "tangerine", "tangerine"],
                         "Score":[1.1, 2.1, 3.1, 1.2, 2.2, 3.2, 1.3, 2.3, 3.3]})
dat_long

shape: (9, 3)

ID	Fruit	Score
i64	str	f64
1	"mango"	1.1
2	"mango"	2.1
3	"mango"	3.1
1	"banana"	1.2
2	"banana"	2.2
3	"banana"	3.2
1	"tangerine"	1.3
2	"tangerine"	2.3
3	"tangerine"	3.3

dat_long.pivot(index = "ID", columns = "Fruit", values = "Score")

shape: (3, 4)

ID	mango	banana	tangerine
i64	f64	f64	f64
1	1.1	1.2	1.3
2	2.1	2.2	2.3
3	3.1	3.2	3.3

Example 2:

dat_long = pl.DataFrame({
    "Account_ID": [8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004, 
            8001, 8002, 8003, 8004, 8001, 8002, 8003, 8004], 
    "Age": [67.8017038366664, 42.9198507293701, 46.2301756642422, 
            39.665983196671, 67.8017038366664, 42.9198507293701, 
            46.2301756642422, 39.665983196671, 67.8017038366664, 
            42.9198507293701, 46.2301756642422, 39.665983196671, 
            67.8017038366664, 42.9198507293701, 46.2301756642422, 
            39.665983196671], 
    "Admission": ["ED", "Planned", "Planned", "ED", "ED", "Planned", 
            "Planned", "ED", "ED", "Planned", "Planned", "ED", "ED", 
            "Planned", "Planned", "ED"], 
    "Lab_key": ["RBC", "RBC", "RBC", "RBC", "WBC", "WBC", "WBC", "WBC", 
            "Hematocrit", "Hematocrit", "Hematocrit", "Hematocrit", 
            "Hemoglobin", "Hemoglobin", "Hemoglobin", "Hemoglobin"], 
    "Lab_value": [4.63449321082268, 3.34968550627897, 4.27037213597765, 
            4.93897736897793, 8374.22887757195, 7612.37380499927, 
            8759.27855519425, 6972.28096216548, 36.272693147236, 
            40.5716317809522, 39.9888624177955, 39.8786884058422, 
            12.6188444991545, 12.1739747363806, 15.1293426442183, 
            14.8885696185238]
})
dat_long

shape: (16, 5)

Account_ID	Age	Admission	Lab_key	Lab_value
i64	f64	str	str	f64
8001	67.801704	"ED"	"RBC"	4.634493
8002	42.919851	"Planned"	"RBC"	3.349686
8003	46.230176	"Planned"	"RBC"	4.270372
8004	39.665983	"ED"	"RBC"	4.938977
8001	67.801704	"ED"	"WBC"	8374.228878
8002	42.919851	"Planned"	"WBC"	7612.373805
8003	46.230176	"Planned"	"WBC"	8759.278555
8004	39.665983	"ED"	"WBC"	6972.280962
8001	67.801704	"ED"	"Hematocrit"	36.272693
8002	42.919851	"Planned"	"Hematocrit"	40.571632
8003	46.230176	"Planned"	"Hematocrit"	39.988862
8004	39.665983	"ED"	"Hematocrit"	39.878688
8001	67.801704	"ED"	"Hemoglobin"	12.618844
8002	42.919851	"Planned"	"Hemoglobin"	12.173975
8003	46.230176	"Planned"	"Hemoglobin"	15.129343
8004	39.665983	"ED"	"Hemoglobin"	14.88857

Usage: df.pivot(value, index, columns)

Unfortunately, including Age in the index currently leads to an error (not executed)

dat_long.pivot(values = "Lab_value",
               index = ["Account_ID", "Age", "Admission"], 
               columns = "Lab_key")

Works if Age is cast to UTF-8 (and back to Float64)

dat_long = dat_long.with_column(
    pl.col("Age").cast(pl.Utf8)
)
dat_wide = dat_long.pivot(values = "Lab_value",
               index = ["Account_ID", "Age", "Admission"], 
               columns = "Lab_key")
dat_wide = dat_wide.with_column(
    pl.col("Age").cast(pl.Float64)
)
dat_wide

shape: (4, 7)

Account_ID	Age	Admission	RBC	WBC	Hematocrit	Hemoglobin
i64	f64	str	f64	f64	f64	f64
8001	67.801704	"ED"	4.634493	8374.228878	36.272693	12.618844
8002	42.919851	"Planned"	3.349686	7612.373805	40.571632	12.173975
8003	46.230176	"Planned"	4.270372	8759.278555	39.988862	15.129343
8004	39.665983	"ED"	4.938977	6972.280962	39.878688	14.88857

Excluding Age, the pivot works:

dat_long.pivot(values = "Lab_value",
               index = ["Account_ID", "Admission"], 
               columns = "Lab_key")

shape: (4, 6)

Account_ID	Admission	RBC	WBC	Hematocrit	Hemoglobin
i64	str	f64	f64	f64	f64
8001	"ED"	4.634493	8374.228878	36.272693	12.618844
8002	"Planned"	3.349686	7612.373805	40.571632	12.173975
8003	"Planned"	4.270372	8759.278555	39.988862	15.129343
8004	"ED"	4.938977	6972.280962	39.878688	14.88857

23.2 Wide to Long