plydata.helper_verbs.summarize_all

class plydata.helper_verbs.summarize_all(*args, **kwargs)[source]

Summarise all non-grouping columns

Parameters
datadataframe, optional

Useful when not using the >> operator.

functionscallable() or tuple or dict or str

Functions to alter the columns:

  • function (any callable) - Function is applied to the column and the result columns replace the original columns.

  • tuple of functions - Each function is applied to all of the columns and the name (__name__) of the function is postfixed to resulting column names.

  • dict of the form {'name': function} - Allows you to apply one or more functions and also control the postfix to the name.

  • str - You can use this to access the aggregation functions provided in summarize:

    # Those that accept a single argument.
    'min'
    'max'
    'sum'
    'cumsum'
    'mean'
    'median'
    'std'
    'first'
    'last'
    'n_distinct'
    'n_unique'
    
argstuple

Arguments to the functions. The arguments are pass to all functions.

kwargsdict

Keyword arguments to the functions. The keyword arguments are passed to all functions.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from plydata import *
>>> df = pd.DataFrame({
...     'alpha': list('aaabbb'),
...     'beta': list('babruq'),
...     'theta': list('cdecde'),
...     'x': [1, 2, 3, 4, 5, 6],
...     'y': [6, 5, 4, 3, 2, 1],
...     'z': [7, 9, 11, 8, 10, 12]
... })

A single summarizing function

>>> df >> select('x', 'z') >> summarize_all('mean')
     x    z
0  3.5  9.5

More than one summarizing function (as a tuple).

>>> df >> select('x', 'z') >> summarize_all(('mean', np.std))
   x_mean  z_mean     x_std     z_std
0     3.5     9.5  1.707825  1.707825

You can use a dictionary to change postscripts of the column names.

>>> (df
...  >> select('x', 'z')
...  >> summarize_all(dict(MEAN='mean', STD=np.std)))
   x_MEAN  z_MEAN     x_STD     z_STD
0     3.5     9.5  1.707825  1.707825

Group by

>>> (df
...  >> group_by('alpha')
...  >> select('x', 'z')
...  >> summarize_all(('mean', np.std)))
  alpha  x_mean  z_mean     x_std     z_std
0     a     2.0     9.0  0.816497  1.632993
1     b     5.0    10.0  0.816497  1.632993

Passing additional arguments

>>> (df
...  >> group_by('alpha')
...  >> select('x', 'z')
...  >> summarize_all(np.std, ddof=1))
  alpha    x    z
0     a  1.0  2.0
1     b  1.0  2.0

The arguments are passed to all functions, so in majority of these cases it might only be possible to summarise with one function.

The group columns is never summarised.

>>> (df
...  >> select('x', 'y', 'z')
...  >> define(parity='x%2')
...  >> group_by('parity')
...  >> summarize_all('mean'))
   parity    x    y         z
0       1  3.0  4.0  9.333333
1       0  4.0  3.0  9.666667