using DataFrames
16 Joins
= DataFrame(PID = collect(1:9),
a = ["UCSF", "HUP", "Stanford",
Hospital "Stanford", "UCSF", "HUP",
"HUP", "Stanford", "UCSF"],
= [22, 34, 41, 19, 53, 21, 63, 22, 19],
Age = [1, 1, 0, 1, 0, 0, 1, 0, 0]) Sex
9×4 DataFrame
Row | PID | Hospital | Age | Sex |
---|---|---|---|---|
Int64 | String | Int64 | Int64 | |
1 | 1 | UCSF | 22 | 1 |
2 | 2 | HUP | 34 | 1 |
3 | 3 | Stanford | 41 | 0 |
4 | 4 | Stanford | 19 | 1 |
5 | 5 | UCSF | 53 | 0 |
6 | 6 | HUP | 21 | 0 |
7 | 7 | HUP | 63 | 1 |
8 | 8 | Stanford | 22 | 0 |
9 | 9 | UCSF | 19 | 0 |
= DataFrame(PID = collect(6:12),
b = [153, 89, 112, 228, 91, 190, 101],
V1 = ["Neurology", "Radiology",
Department "Emergency", "Cardiology",
"Surgery", "Neurology", "Psychiatry"])
7×3 DataFrame
Row | PID | V1 | Department |
---|---|---|---|
Int64 | Int64 | String | |
1 | 6 | 153 | Neurology |
2 | 7 | 89 | Radiology |
3 | 8 | 112 | Emergency |
4 | 9 | 228 | Cardiology |
5 | 10 | 91 | Surgery |
6 | 11 | 190 | Neurology |
7 | 12 | 101 | Psychiatry |
16.1 Inner join
= innerjoin(a, b, on = :PID) ab_inner
4×6 DataFrame
Row | PID | Hospital | Age | Sex | V1 | Department |
---|---|---|---|---|---|---|
Int64 | String | Int64 | Int64 | Int64 | String | |
1 | 6 | HUP | 21 | 0 | 153 | Neurology |
2 | 7 | HUP | 63 | 1 | 89 | Radiology |
3 | 8 | Stanford | 22 | 0 | 112 | Emergency |
4 | 9 | UCSF | 19 | 0 | 228 | Cardiology |
Note that the resulting table only contains cases found in both data frames (i.e. IDs 6 through 9)
16.2 Outer join
= outerjoin(a, b, on = :PID) ab_outer
12×6 DataFrame
Row | PID | Hospital | Age | Sex | V1 | Department |
---|---|---|---|---|---|---|
Int64 | String? | Int64? | Int64? | Int64? | String? | |
1 | 6 | HUP | 21 | 0 | 153 | Neurology |
2 | 7 | HUP | 63 | 1 | 89 | Radiology |
3 | 8 | Stanford | 22 | 0 | 112 | Emergency |
4 | 9 | UCSF | 19 | 0 | 228 | Cardiology |
5 | 1 | UCSF | 22 | 1 | missing | missing |
6 | 2 | HUP | 34 | 1 | missing | missing |
7 | 3 | Stanford | 41 | 0 | missing | missing |
8 | 4 | Stanford | 19 | 1 | missing | missing |
9 | 5 | UCSF | 53 | 0 | missing | missing |
10 | 10 | missing | missing | missing | 91 | Surgery |
11 | 11 | missing | missing | missing | 190 | Neurology |
12 | 12 | missing | missing | missing | 101 | Psychiatry |
Note that the resulting data frame contains all IDs found in either input data frame and missing values are represented with missing
16.3 Left outer join
= leftjoin(a, b, on = :PID) ab_leftouter
9×6 DataFrame
Row | PID | Hospital | Age | Sex | V1 | Department |
---|---|---|---|---|---|---|
Int64 | String | Int64 | Int64 | Int64? | String? | |
1 | 6 | HUP | 21 | 0 | 153 | Neurology |
2 | 7 | HUP | 63 | 1 | 89 | Radiology |
3 | 8 | Stanford | 22 | 0 | 112 | Emergency |
4 | 9 | UCSF | 19 | 0 | 228 | Cardiology |
5 | 1 | UCSF | 22 | 1 | missing | missing |
6 | 2 | HUP | 34 | 1 | missing | missing |
7 | 3 | Stanford | 41 | 0 | missing | missing |
8 | 4 | Stanford | 19 | 1 | missing | missing |
9 | 5 | UCSF | 53 | 0 | missing | missing |
Note how the resulting data frame contains all IDs present in the left input DataFrame only.
16.4 Right outer join
= rightjoin(a, b, on = :PID) ab_rightouter
7×6 DataFrame
Row | PID | Hospital | Age | Sex | V1 | Department |
---|---|---|---|---|---|---|
Int64 | String? | Int64? | Int64? | Int64 | String | |
1 | 6 | HUP | 21 | 0 | 153 | Neurology |
2 | 7 | HUP | 63 | 1 | 89 | Radiology |
3 | 8 | Stanford | 22 | 0 | 112 | Emergency |
4 | 9 | UCSF | 19 | 0 | 228 | Cardiology |
5 | 10 | missing | missing | missing | 91 | Surgery |
6 | 11 | missing | missing | missing | 190 | Neurology |
7 | 12 | missing | missing | missing | 101 | Psychiatry |
Note how the resulting data frame contains all IDs present in the right input DataFrame only.