plydata.helper_verbs.summarize_all¶
-
class
plydata.helper_verbs.
summarize_all
(*args, **kwargs)[source]¶ Summarise all non-grouping columns
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- functions
callable()
ortuple
ordict
orstr
Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tuple
of functions - Each function is applied to all of the columns and the name (__name__
) of the function is postfixed to resulting column names.dict
of the form{'name': function}
- Allows you to apply one or more functions and also control the postfix to the name.str
- You can use this to access the aggregation functions provided insummarize
:# Those that accept a single argument. 'min' 'max' 'sum' 'cumsum' 'mean' 'median' 'std' 'first' 'last' 'n_distinct' 'n_unique'
- args
tuple
Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict
Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
A single summarizing function
>>> df >> select('x', 'z') >> summarize_all('mean') x z 0 3.5 9.5
More than one summarizing function (as a tuple).
>>> df >> select('x', 'z') >> summarize_all(('mean', np.std)) x_mean z_mean x_std z_std 0 3.5 9.5 1.707825 1.707825
You can use a dictionary to change postscripts of the column names.
>>> (df ... >> select('x', 'z') ... >> summarize_all(dict(MEAN='mean', STD=np.std))) x_MEAN z_MEAN x_STD z_STD 0 3.5 9.5 1.707825 1.707825
Group by
>>> (df ... >> group_by('alpha') ... >> select('x', 'z') ... >> summarize_all(('mean', np.std))) alpha x_mean z_mean x_std z_std 0 a 2.0 9.0 0.816497 1.632993 1 b 5.0 10.0 0.816497 1.632993
Passing additional arguments
>>> (df ... >> group_by('alpha') ... >> select('x', 'z') ... >> summarize_all(np.std, ddof=1)) alpha x z 0 a 1.0 2.0 1 b 1.0 2.0
The arguments are passed to all functions, so in majority of these cases it might only be possible to summarise with one function.
The group columns is never summarised.
>>> (df ... >> select('x', 'y', 'z') ... >> define(parity='x%2') ... >> group_by('parity') ... >> summarize_all('mean')) parity x y z 0 1 3.0 4.0 9.333333 1 0 4.0 3.0 9.666667