ids =tuple([21, 22, 23, 24, 25])con.execute(f""" CREATE TABLE synth1 AS SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') """)con.execute("SELECT * FROM synth1").df()
ID
Age
Group
0
1
49.001420
D
1
2
28.266542
B
2
3
31.025146
A
3
4
25.554986
C
4
5
36.689864
B
...
...
...
...
495
496
36.633847
A
496
497
34.457815
C
497
498
49.096331
A
498
499
31.708578
A
499
500
53.850253
A
500 rows × 3 columns
26.6.2 Read only specific IDs
ids =tuple([21, 22, 23, 24, 25])con.execute(f""" CREATE TABLE synth1f AS SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') WHERE ID in {ids}; """)con.execute("SELECT * FROM synth1f").df()
ID
Age
Group
0
21
43.644608
A
1
22
43.836510
A
2
23
51.134057
C
3
24
52.115098
B
4
25
36.516745
C
26.7 Using Queries with the Python API
ids =tuple([21, 22, 23, 24, 25])sq = duckdb.query(f""" SELECT * FROM read_csv_auto('/Users/egenn/icloud/Data/synth1.csv') WHERE ID in {ids}; """)sq
---------------------
--- Relation Tree ---
---------------------
Subquery
---------------------
-- Result Columns --
---------------------
- ID (INTEGER)
- Age (DOUBLE)
- Group (VARCHAR)
---------------------
-- Result Preview --
---------------------
ID Age Group
INTEGER DOUBLE VARCHAR
[ Rows: 5]
21 43.6446078379911 A
22 43.8365100056002 A
23 51.1340571729247 C
24 52.1150978723465 B
25 36.516745383723 C
You can execute the query at any time:
sq.execute().df()
ID
Age
Group
0
21
43.644608
A
1
22
43.836510
A
2
23
51.134057
C
3
24
52.115098
B
4
25
36.516745
C
26.8 Compose queries
How to specify path, select, and filter programmatically
26.8.1 Select
distinct ="DISTINCT"# set to "" to not filter unique rowssep =","path ="/Users/egenn/icloud/Data/synth1.csv"cols =tuple(["ID", "Age"])cols ='"'+'", "'.join(cols) +'"'ignore_errors =True# cols = "*" # to select all programmaticallysq = duckdb.query(f""" SELECT {distinct + cols} FROM read_csv_auto('{path}', sep = '{sep}', ignore_errors = {ignore_errors}); """)sq.execute().df()
ID
Age
0
1
49.001420
1
2
28.266542
2
3
31.025146
3
4
25.554986
4
5
36.689864
...
...
...
495
411
45.938594
496
416
30.819571
497
209
33.385111
498
384
42.311528
499
435
31.042009
500 rows × 2 columns
26.8.2 Filter
Suppose you have a list of IDs in a list, convert to tuple first
ids =list(range(80, 91))filter_col ="ID"sq = sq.filter(f"{filter_col} in {tuple(ids)}")sq.execute().df()