13  CategoricalArrays

To represent categorical variables in Julia, we can use the CategoricalArray type from CategoricalArrays.jl.

using CategoricalArrays

13.1 Create CategoricalArray with categorical()

x = ["a", "c", "d", "b", "a", "a", "d", "c"]
8-element Vector{String}:
 "a"
 "c"
 "d"
 "b"
 "a"
 "a"
 "d"
 "c"
xc = categorical(x)
8-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "a"
 "c"
 "d"
 "b"
 "a"
 "a"
 "d"
 "c"

The same can be achieved using the type object:

xc = CategoricalArray(x)
8-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "a"
 "c"
 "d"
 "b"
 "a"
 "a"
 "d"
 "c"

13.2 The underlying UInt32 vector

A CategoricalArray is a mapping between an underlying UInt32 index to a set of levels.

You can access the underlying integers:

xc.refs
8-element Vector{UInt32}:
 0x00000001
 0x00000003
 0x00000004
 0x00000002
 0x00000001
 0x00000001
 0x00000004
 0x00000003

Convert them to Int32:

xc.refs .% Int32
8-element Vector{Int32}:
 1
 3
 4
 2
 1
 1
 4
 3

or using convert():

convert(Array{Int32}, xc.refs)
8-element Vector{Int32}:
 1
 3
 4
 2
 1
 1
 4
 3

13.3 Get levels of a CategoricalArray with levels()

levels(xc)
4-element Vector{String}:
 "a"
 "b"
 "c"
 "d"

13.4 Set new level labels with recode() & recode!()

recode!(xc, 
        "a" => "alpha", 
        "b" => "beta", 
        "c" => "gamma", 
        "d" => "delta")
8-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "alpha"
 "gamma"
 "delta"
 "beta"
 "alpha"
 "alpha"
 "delta"
 "gamma"

13.5 Reorder levels with levels() & levels!()

levels() in Julia vs. R

In Julia, levels() reorders levels of a CategoricalArray, unlike in R where it recodes / changes level labels.

xc
8-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "alpha"
 "gamma"
 "delta"
 "beta"
 "alpha"
 "alpha"
 "delta"
 "gamma"
levels(xc)
4-element Vector{String}:
 "alpha"
 "beta"
 "gamma"
 "delta"
levels!(xc, ["delta", "gamma", "beta", "alpha"])
8-element CategoricalArrays.CategoricalArray{String,1,UInt32}:
 "alpha"
 "gamma"
 "delta"
 "beta"
 "alpha"
 "alpha"
 "delta"
 "gamma"
levels(xc)
4-element Vector{String}:
 "delta"
 "gamma"
 "beta"
 "alpha"