14  data.table Data I/O

14.1 Read delimited data with fread()

data.table’s “fast read” function fread() offers a replacement for the base read.csv() command. By default, it creates a data.table object, which is likely preferred in most scenarios, but it can create a data.frame by setting data.table = FALSE.

It is very useful because it allows:

  • very fast, parallelized reading of large delimited files, e.g. with millions of rows
  • can directly read zipped files
  • smart automatic discovery of delimiters when sep = "auto" (default)
  • multiple other conveniences

As an example, you can read a gzipped file directly from a URL:

dat <- fread("https://archive.ics.uci.edu/ml/machine-learning-databases/00519/heart_failure_clinical_records_dataset.csv")
dat
       age anaemia creatinine_phosphokinase diabetes ejection_fraction
     <num>   <int>                    <int>    <int>             <int>
  1:    75       0                      582        0                20
  2:    55       0                     7861        0                38
  3:    65       0                      146        0                20
  4:    50       1                      111        0                20
  5:    65       1                      160        1                20
 ---                                                                  
295:    62       0                       61        1                38
296:    55       0                     1820        0                38
297:    45       0                     2060        1                60
298:    45       0                     2413        0                38
299:    50       0                      196        0                45
     high_blood_pressure platelets serum_creatinine serum_sodium   sex smoking
                   <int>     <num>            <num>        <int> <int>   <int>
  1:                   1    265000              1.9          130     1       0
  2:                   0    263358              1.1          136     1       0
  3:                   0    162000              1.3          129     1       1
  4:                   0    210000              1.9          137     1       0
  5:                   0    327000              2.7          116     0       0
 ---                                                                          
295:                   1    155000              1.1          143     1       1
296:                   0    270000              1.2          139     0       0
297:                   0    742000              0.8          138     0       0
298:                   0    140000              1.4          140     1       1
299:                   0    395000              1.6          136     1       1
      time DEATH_EVENT
     <int>       <int>
  1:     4           1
  2:     6           1
  3:     7           1
  4:     7           1
  5:     8           1
 ---                  
295:   270           0
296:   271           0
297:   278           0
298:   280           0
299:   285           0

14.1.1 See also

Convenience features of fread

14.2 Write delimited data with fwrite()

fwrite() similarly provides a faster, parallelized, and more flexible replacement for write.csv():

fwrite(dat, "/path/to/file.csv")

14.3 Save a data.table to RDS file

Same as any R object, you can save a data.table to disk using saveRDS(). Suppose you have read data in with fread() or coerced a dataset using as.data.table(), done some cleaning up, type conversions, data transformations, etc, this is the preferred way to save your work, so you can reload at any time.

saveRDS(dat, "/path/to/data.rds")