plydata.helper_verbs.group_by_all

class plydata.helper_verbs.group_by_all(*args, **kwargs)[source]

Groupby all columns

Parameters
datadataframe, optional

Useful when not using the >> operator.

functionscallable() or tuple or dict or str

Functions to alter the columns:

  • function (any callable) - Function is applied to the column and the result columns replace the original columns.

  • tuple of functions - Each function is applied to all of the columns and the name (__name__) of the function is postfixed to resulting column names.

  • dict of the form {'name': function} - Allows you to apply one or more functions and also control the postfix to the name.

  • str - String can be used for more complex statements, but the resulting names will be terrible.

argstuple

Arguments to the functions. The arguments are pass to all functions.

kwargsdict

Keyword arguments to the functions. The keyword arguments are passed to all functions.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from plydata import *
>>> df = pd.DataFrame({
...     'alpha': list('aaabbb'),
...     'beta': list('babruq'),
...     'theta': list('cdecde'),
...     'x': [1, 2, 3, 4, 5, 6],
...     'y': [6, 5, 4, 3, 2, 1],
...     'z': [7, 9, 11, 8, 10, 12]
... })

Grouping by all the columns

>>> df >> group_by_all()
groups: ['alpha', 'beta', 'theta', 'x', 'y', 'z']
  alpha beta theta  x  y   z
0     a    b     c  1  6   7
1     a    a     d  2  5   9
2     a    b     e  3  4  11
3     b    r     c  4  3   8
4     b    u     d  5  2  10
5     b    q     e  6  1  12

Grouping by all columns created by a function. Same output as above, but now all the columns are categorical

>>> result = df >> group_by_all(pd.Categorical)
>>> result
groups: ['alpha', 'beta', 'theta', 'x', 'y', 'z']
  alpha beta theta  x  y   z
0     a    b     c  1  6   7
1     a    a     d  2  5   9
2     a    b     e  3  4  11
3     b    r     c  4  3   8
4     b    u     d  5  2  10
5     b    q     e  6  1  12
>>> result['x']
0    1
1    2
2    3
3    4
4    5
5    6
Name: x, dtype: category
Categories (6, int64): [1, 2, 3, 4, 5, 6]

If apply more than one function or provide a postfix, the original columns are retained.

>>> (df
...  >> select('x', 'y', 'z')
...  >> group_by_all(dict(cat=pd.Categorical)))
groups: ['x_cat', 'y_cat', 'z_cat']
   x  y   z x_cat y_cat z_cat
0  1  6   7     1     6     7
1  2  5   9     2     5     9
2  3  4  11     3     4    11
3  4  3   8     4     3     8
4  5  2  10     5     2    10
5  6  1  12     6     1    12