import dask.dataframe as dd
import pandas as pd
import numpy as np
20 Aggregate
= pd.DataFrame([('bird', 'Falconiformes', 389.0),
df 'bird', 'Psittaciformes', 24.0),
('mammal', 'Carnivora', 80.2),
('mammal', 'Primates', np.nan),
('mammal', 'Carnivora', 58)],
(=['falcon', 'parrot', 'lion', 'monkey', 'leopard'],
index=('class', 'order', 'max_speed'))
columns= dd.from_pandas(df, npartitions=1)
df df
Dask DataFrame Structure:
class | order | max_speed | |
---|---|---|---|
npartitions=1 | |||
falcon | object | object | float64 |
parrot | ... | ... | ... |
Dask Name: from_pandas, 1 graph layer
20.0.1 groupby()
: group by categorical
= df.groupby('class') grouped
= df.groupby(['class', 'order']) grouped2
grouped.size().compute()
class
bird 2
mammal 3
dtype: int64
grouped.mean().compute()
max_speed | |
---|---|
class | |
bird | 206.5 |
mammal | 69.1 |
grouped2.mean().compute()
max_speed | ||
---|---|---|
class | order | |
bird | Falconiformes | 389.0 |
Psittaciformes | 24.0 | |
mammal | Carnivora | 69.1 |
Primates | NaN |
or in a single step:
'class').mean().compute() df.groupby(
max_speed | |
---|---|
class | |
bird | 206.5 |
mammal | 69.1 |