b = pd.DataFrame({"PID": list(range(6, 13)),"V1": [153, 89, 112, 228, 91, 190, 101],"Department": ["Neurology", "Radiology","Emergency", "Cardiology","Surgery", "Neurology", "Psychiatry"]})# Optional, but supposed to help in large datasetsb.set_index("PID")
V1
Department
PID
6
153
Neurology
7
89
Radiology
8
112
Emergency
9
228
Cardiology
10
91
Surgery
11
190
Neurology
12
101
Psychiatry
15.1 Inner join
how argument default to “left”:
pd.merge(a, b, on ="PID", how ="inner")
PID
Hospital
Age
Sex
V1
Department
0
6
HUP
21
0
153
Neurology
1
7
HUP
63
1
89
Radiology
2
8
Stanford
22
0
112
Emergency
3
9
UCSF
19
0
228
Cardiology
Since we have set an index on both DataFrames, we can skip ‘on’:
pd.merge(a, b, how ="inner")
PID
Hospital
Age
Sex
V1
Department
0
6
HUP
21
0
153
Neurology
1
7
HUP
63
1
89
Radiology
2
8
Stanford
22
0
112
Emergency
3
9
UCSF
19
0
228
Cardiology
Note: The pd.join syntax is simialr to polars, but unlike polars, it only works correctly under specific conditions and will often give wrong results with no error or warning - avoid.
15.2 Outer join
pd.merge(a, b, on ="PID", how ="outer")
PID
Hospital
Age
Sex
V1
Department
0
1
UCSF
22.0
1.0
NaN
NaN
1
2
HUP
34.0
1.0
NaN
NaN
2
3
Stanford
41.0
0.0
NaN
NaN
3
4
Stanford
19.0
1.0
NaN
NaN
4
5
UCSF
53.0
0.0
NaN
NaN
5
6
HUP
21.0
0.0
153.0
Neurology
6
7
HUP
63.0
1.0
89.0
Radiology
7
8
Stanford
22.0
0.0
112.0
Emergency
8
9
UCSF
19.0
0.0
228.0
Cardiology
9
10
NaN
NaN
NaN
91.0
Surgery
10
11
NaN
NaN
NaN
190.0
Neurology
11
12
NaN
NaN
NaN
101.0
Psychiatry
15.3 Left outer join
pd.merge(a, b, on ="PID", how ="left")
PID
Hospital
Age
Sex
V1
Department
0
1
UCSF
22
1
NaN
NaN
1
2
HUP
34
1
NaN
NaN
2
3
Stanford
41
0
NaN
NaN
3
4
Stanford
19
1
NaN
NaN
4
5
UCSF
53
0
NaN
NaN
5
6
HUP
21
0
153.0
Neurology
6
7
HUP
63
1
89.0
Radiology
7
8
Stanford
22
0
112.0
Emergency
8
9
UCSF
19
0
228.0
Cardiology
15.4 Right outer join
There is no how = "right" option, so switch places of DataFrames: