using CSV, DataFrames, StatsBase
17 Aggregate
= CSV.read("/Users/egenn/icloud/Data/iris.csv", DataFrame) iris
150×5 DataFrame
125 rows omitted
Row | Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
Cleanup column names
rename!(iris, replace.(names(iris), "." => "_"))
150×5 DataFrame
125 rows omitted
Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
139 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
140 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
141 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
142 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
143 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
144 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
145 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
146 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
147 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
148 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
149 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
150 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
17.1 Create a grouped DataFrame
groupby(iris, :Species)
GroupedDataFrame with 3 groups based on key: Species
First Group (50 rows): Species = "setosa"
25 rows omitted
Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
8 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
10 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
11 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
12 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
13 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
39 | 4.4 | 3.0 | 1.3 | 0.2 | setosa |
40 | 5.1 | 3.4 | 1.5 | 0.2 | setosa |
41 | 5.0 | 3.5 | 1.3 | 0.3 | setosa |
42 | 4.5 | 2.3 | 1.3 | 0.3 | setosa |
43 | 4.4 | 3.2 | 1.3 | 0.2 | setosa |
44 | 5.0 | 3.5 | 1.6 | 0.6 | setosa |
45 | 5.1 | 3.8 | 1.9 | 0.4 | setosa |
46 | 4.8 | 3.0 | 1.4 | 0.3 | setosa |
47 | 5.1 | 3.8 | 1.6 | 0.2 | setosa |
48 | 4.6 | 3.2 | 1.4 | 0.2 | setosa |
49 | 5.3 | 3.7 | 1.5 | 0.2 | setosa |
50 | 5.0 | 3.3 | 1.4 | 0.2 | setosa |
⋮
Last Group (50 rows): Species = "virginica"
25 rows omitted
Row | Sepal_Length | Sepal_Width | Petal_Length | Petal_Width | Species |
---|---|---|---|---|---|
Float64 | Float64 | Float64 | Float64 | String15 | |
1 | 6.3 | 3.3 | 6.0 | 2.5 | virginica |
2 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
3 | 7.1 | 3.0 | 5.9 | 2.1 | virginica |
4 | 6.3 | 2.9 | 5.6 | 1.8 | virginica |
5 | 6.5 | 3.0 | 5.8 | 2.2 | virginica |
6 | 7.6 | 3.0 | 6.6 | 2.1 | virginica |
7 | 4.9 | 2.5 | 4.5 | 1.7 | virginica |
8 | 7.3 | 2.9 | 6.3 | 1.8 | virginica |
9 | 6.7 | 2.5 | 5.8 | 1.8 | virginica |
10 | 7.2 | 3.6 | 6.1 | 2.5 | virginica |
11 | 6.5 | 3.2 | 5.1 | 2.0 | virginica |
12 | 6.4 | 2.7 | 5.3 | 1.9 | virginica |
13 | 6.8 | 3.0 | 5.5 | 2.1 | virginica |
⋮ | ⋮ | ⋮ | ⋮ | ⋮ | ⋮ |
39 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
40 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
41 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
42 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
43 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
44 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
45 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
46 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
47 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
48 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
49 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
50 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
17.2 Apply a function to a grouped DataFrame
combine(groupby(iris, :Species), :Sepal_Length => mean)
3×2 DataFrame
Row | Species | Sepal_Length_mean |
---|---|---|
String15 | Float64 | |
1 | setosa | 5.006 |
2 | versicolor | 5.936 |
3 | virginica | 6.588 |