8  Data Input/Output

8.1 Julia IO

Julia IO includes a collection of modules for reading and writing different file formats in Julia.

Part of Julia IO, FileIO.jl is a module that “aims to provide a common framework for detecting file formats and dispatching to appropriate readers/writers.”

using DataFrames
df = DataFrame(a = randn(20), b = randn(20));
insertcols!(df, 2, :=> df.a .^2);
insertcols!(df, :=> df.b .^2);
first(df, 3)
3×4 DataFrame
Row a b
Float64 Float64 Float64 Float64
1 0.317969 0.101104 1.03639 1.0741
2 0.0558648 0.00312088 -1.93684 3.75135
3 -0.0300227 0.000901361 -0.579862 0.33624

8.2 CSV

CSV Support in Julia is provided by CSV.jl. The module provides a high performance module for reading and writing CSV data in Julia.

CSV.jl Docs

using CSV

8.2.1 Read CSV

To read a CSV file as a DataFrame, pipe CSV.File() to DataFrame():

iris = CSV.File(expanduser("~/icloud/Data/iris.csv")) |> DataFrame
150×5 DataFrame
125 rows omitted
Row Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Float64 Float64 Float64 Float64 String15
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica

same as:

iris = DataFrame(CSV.File(expanduser("~/icloud/Data/iris.csv")))
150×5 DataFrame
125 rows omitted
Row Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Float64 Float64 Float64 Float64 String15
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica

or use CSV.read() with second argument set to the Type to sink to:

iris = CSV.read(expanduser("~/icloud/Data/iris.csv"), DataFrame)
150×5 DataFrame
125 rows omitted
Row Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Float64 Float64 Float64 Float64 String15
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
7 4.6 3.4 1.4 0.3 setosa
8 5.0 3.4 1.5 0.2 setosa
9 4.4 2.9 1.4 0.2 setosa
10 4.9 3.1 1.5 0.1 setosa
11 5.4 3.7 1.5 0.2 setosa
12 4.8 3.4 1.6 0.2 setosa
13 4.8 3.0 1.4 0.1 setosa
139 6.0 3.0 4.8 1.8 virginica
140 6.9 3.1 5.4 2.1 virginica
141 6.7 3.1 5.6 2.4 virginica
142 6.9 3.1 5.1 2.3 virginica
143 5.8 2.7 5.1 1.9 virginica
144 6.8 3.2 5.9 2.3 virginica
145 6.7 3.3 5.7 2.5 virginica
146 6.7 3.0 5.2 2.3 virginica
147 6.3 2.5 5.0 1.9 virginica
148 6.5 3.0 5.2 2.0 virginica
149 6.2 3.4 5.4 2.3 virginica
150 5.9 3.0 5.1 1.8 virginica

Show the first 5 rows of the DataFrame:

first(iris, 5)
5×5 DataFrame
Row Sepal.Length Sepal.Width Petal.Length Petal.Width Species
Float64 Float64 Float64 Float64 String15
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa

8.2.2 Write CSV

CSV.write(expanduser("~/icloud/Data/df.csv"), df)
df = DataFrame(a = randn(20), b = randn(20));
insertcols!(df, 2, :=> df.a .^2);
insertcols!(df, :=> df.b .^2);
first(df, 3)
3×4 DataFrame
Row a b
Float64 Float64 Float64 Float64
1 0.451832 0.204152 1.65199 2.72906
2 1.04743 1.09711 -0.0433866 0.0018824
3 -1.48649 2.20966 -0.529627 0.280505

8.3 Serialization

The Standard Library module Serialization allows to serialize() and deserialize() arbitrary data to and from a file. Use it for short-term, preferably local, I/O, as it will likely not be interoperable between systems and/or Julia versions.

using Serialization
serialize(expanduser("~/icloud/Data/Julia/df"), df)

8.4 JLD2

JLD2.jl reads and writes Julia structures using a subset of HDF5 written in pure Julia.

using JLD2

To save and load data using JLD2, use the @save and @load macros.

You can save multiple julia objects to a single file.

8.4.1 Save JLD

df1 = DataFrame(a = 1:5, b = randn(5))
df2 = DataFrame(c = 6:10, d = randn(5))
@save expanduser("~/icloud/Data/Julia/dfs.jld") df1 df2

8.4.2 Load JLD

@load expanduser("~/icloud/Data/Julia/dfs.jld")

8.5 HDF5

HDF5 Support in Julia is provided by HDF5.jl.

8.6 Arrow

Apache Arrow format support is provided by Arrow.jl

8.7 RData

Support for reading R’s .RData and .rda formats is provided by RData.jl.

To write to an .RData file it is recommended to use RCall.jl to call R within Julia

8.8 BSON

Support for BSON files is provided by BSON.jl

8.9 MAT

Support for reading and writing Matlab .mat files is provided by MAT.jl