using DataFrames
8 Data Input/Output
8.1 Julia IO
Julia IO includes a collection of modules for reading and writing different file formats in Julia.
Part of Julia IO, FileIO.jl is a module that “aims to provide a common framework for detecting file formats and dispatching to appropriate readers/writers.”
= DataFrame(a = randn(20), b = randn(20));
df insertcols!(df, 2, :a² => df.a .^2);
insertcols!(df, :b² => df.b .^2);
first(df, 3)
Row | a | a² | b | b² |
---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | |
1 | 0.317969 | 0.101104 | 1.03639 | 1.0741 |
2 | 0.0558648 | 0.00312088 | -1.93684 | 3.75135 |
3 | -0.0300227 | 0.000901361 | -0.579862 | 0.33624 |
8.2 CSV
CSV Support in Julia is provided by CSV.jl. The module provides a high performance module for reading and writing CSV data in Julia.
using CSV
8.2.1 Read CSV
To read a CSV file as a DataFrame, pipe CSV.File()
to DataFrame()
:
= CSV.File(expanduser("~/icloud/Data/iris.csv")) |> DataFrame iris
Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
same as:
= DataFrame(CSV.File(expanduser("~/icloud/Data/iris.csv"))) iris
Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
or use CSV.read()
with second argument set to the Type to sink to:
= CSV.read(expanduser("~/icloud/Data/iris.csv"), DataFrame) iris
Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Show the first 5 rows of the DataFrame:
first(iris, 5)
Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
8.2.2 Write CSV
write(expanduser("~/icloud/Data/df.csv"), df) CSV.
= DataFrame(a = randn(20), b = randn(20));
df insertcols!(df, 2, :a² => df.a .^2);
insertcols!(df, :b² => df.b .^2);
first(df, 3)
Row | a | a² | b | b² |
---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | |
1 | 0.451832 | 0.204152 | 1.65199 | 2.72906 |
2 | 1.04743 | 1.09711 | -0.0433866 | 0.0018824 |
3 | -1.48649 | 2.20966 | -0.529627 | 0.280505 |
8.3 Serialization
The Standard Library module Serialization
allows to serialize()
and deserialize()
arbitrary data to and from a file. Use it for short-term, preferably local, I/O, as it will likely not be interoperable between systems and/or Julia versions.
using Serialization
serialize(expanduser("~/icloud/Data/Julia/df"), df)
8.4 JLD2
JLD2.jl reads and writes Julia structures using a subset of HDF5 written in pure Julia.
using JLD2
To save and load data using JLD2, use the @save
and @load
macros.
You can save multiple julia objects to a single file.
8.4.1 Save JLD
= DataFrame(a = 1:5, b = randn(5))
df1 = DataFrame(c = 6:10, d = randn(5))
df2 @save expanduser("~/icloud/Data/Julia/dfs.jld") df1 df2
8.4.2 Load JLD
@load expanduser("~/icloud/Data/Julia/dfs.jld")
8.5 HDF5
HDF5 Support in Julia is provided by HDF5.jl.
8.6 Arrow
Apache Arrow format support is provided by Arrow.jl
8.7 RData
Support for reading R’s .RData
and .rda
formats is provided by RData.jl.
To write to an .RData
file it is recommended to use RCall.jl to call R within Julia
8.8 BSON
Support for BSON files is provided by BSON.jl
8.9 MAT
Support for reading and writing Matlab .mat
files is provided by MAT.jl