14 data.table Data I/O
14.1 Read delimited data with fread()
data.table’s “fast read” function fread()
offers a replacement for the base read.csv()
command. By default, it creates a data.table object, which is likely preferred in most scenarios, but it can create a data.frame by setting data.table = FALSE
.
It is very useful because it allows:
- very fast, parallelized reading of large delimited files, e.g. with millions of rows
- can directly read zipped files
- smart automatic discovery of delimiters when
sep = "auto"
(default) - multiple other conveniences
As an example, you can read a gzipped file directly from a URL:
dat <- fread("https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv")
dat
age anaemia creatinine_phosphokinase diabetes ejection_fraction
<num> <int> <int> <int> <int>
1: 75 0 582 0 20
2: 55 0 7861 0 38
3: 65 0 146 0 20
4: 50 1 111 0 20
5: 65 1 160 1 20
---
295: 62 0 61 1 38
296: 55 0 1820 0 38
297: 45 0 2060 1 60
298: 45 0 2413 0 38
299: 50 0 196 0 45
high_blood_pressure platelets serum_creatinine serum_sodium sex smoking
<int> <num> <num> <int> <int> <int>
1: 1 265000 1.9 130 1 0
2: 0 263358 1.1 136 1 0
3: 0 162000 1.3 129 1 1
4: 0 210000 1.9 137 1 0
5: 0 327000 2.7 116 0 0
---
295: 1 155000 1.1 143 1 1
296: 0 270000 1.2 139 0 0
297: 0 742000 0.8 138 0 0
298: 0 140000 1.4 140 1 1
299: 0 395000 1.6 136 1 1
time DEATH_EVENT
<int> <int>
1: 4 1
2: 6 1
3: 7 1
4: 7 1
5: 8 1
---
295: 270 0
296: 271 0
297: 278 0
298: 280 0
299: 285 0
14.1.1 See also
14.2 Write delimited data with fwrite()
fwrite()
similarly provides a faster, parallelized, and more flexible replacement for write.csv()
:
fwrite(dat, "/path/to/file.csv")
14.3 Save a data.table
to RDS file
Same as any R object, you can save a data.table
to disk using saveRDS()
. Suppose you have read data in with fread()
or coerced a dataset using as.data.table()
, done some cleaning up, type conversions, data transformations, etc, this is the preferred way to save your work, so you can reload at any time.
saveRDS(dat, "/path/to/data.rds")