plydata.helper_verbs.group_by_if

class plydata.helper_verbs.group_by_if(*args, **kwargs)[source]

Group by selected columns that are true for a predicate

Parameters
datadataframe, optional

Useful when not using the >> operator.

predicatefunction

A predicate function to be applied to the columns of the dataframe. Good candidates for predicate functions are those that check the type of the column. Such function are avaible at pandas.api.dtypes, for example pandas.api.types.is_numeric_dtype().

For convenience, you can reference the is_*_dtype functions with shorter strings:

'is_bool'             # pandas.api.types.is_bool_dtype
'is_categorical'      # pandas.api.types.is_categorical_dtype
'is_complex'          # pandas.api.types.is_complex_dtype
'is_datetime64_any'   # pandas.api.types.is_datetime64_any_dtype
'is_datetime64'       # pandas.api.types.is_datetime64_dtype
'is_datetime64_ns'    # pandas.api.types.is_datetime64_ns_dtype
'is_datetime64tz'     # pandas.api.types.is_datetime64tz_dtype
'is_float'            # pandas.api.types.is_float_dtype
'is_int64'            # pandas.api.types.is_int64_dtype
'is_integer'          # pandas.api.types.is_integer_dtype
'is_interval'         # pandas.api.types.is_interval_dtype
'is_numeric'          # pandas.api.types.is_numeric_dtype
'is_object'           # pandas.api.types.is_object_dtype
'is_period'           # pandas.api.types.is_period_dtype
'is_signed_integer'   # pandas.api.types.is_signed_integer_dtype
'is_string'           # pandas.api.types.is_string_dtype
'is_timedelta64'      # pandas.api.types.is_timedelta64_dtype
'is_timedelta64_ns'   # pandas.api.types.is_timedelta64_ns_dtype
'is_unsigned_integer' # pandas.api.types.is_unsigned_integer_dtype

No other string values are allowed.

functionscallable() or tuple or dict or str, optional

Functions to alter the columns:

  • function (any callable) - Function is applied to the column and the result columns replace the original columns.

  • tuple of functions - Each function is applied to all of the columns and the name (__name__) of the function is postfixed to resulting column names.

  • dict of the form {'name': function} - Allows you to apply one or more functions and also control the postfix to the name.

  • str - String can be used for more complex statements, but the resulting names will be terrible.

argstuple

Arguments to the functions. The arguments are pass to all functions.

kwargsdict

Keyword arguments to the functions. The keyword arguments are passed to all functions.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from plydata import *
>>> df = pd.DataFrame({
...     'alpha': list('aaabbb'),
...     'beta': list('babruq'),
...     'theta': list('cdecde'),
...     'x': [1, 2, 3, 4, 5, 6],
...     'y': [6, 5, 4, 3, 2, 1],
...     'z': [7, 9, 11, 8, 10, 12]
... })

Group by all string type columns. 'is_string' is a shortcut to pandas.api.types.is_string_dtype().

>>> df >> group_by_if('is_string')
groups: ['alpha', 'beta', 'theta']
  alpha beta theta  x  y   z
0     a    b     c  1  6   7
1     a    a     d  2  5   9
2     a    b     e  3  4  11
3     b    r     c  4  3   8
4     b    u     d  5  2  10
5     b    q     e  6  1  12

Applying a function to create the group columns

>>> def double(s):
...     return s + s
>>> df >> group_by_if('is_string', double)
groups: ['alpha', 'beta', 'theta']
  alpha beta theta  x  y   z
0    aa   bb    cc  1  6   7
1    aa   aa    dd  2  5   9
2    aa   bb    ee  3  4  11
3    bb   rr    cc  4  3   8
4    bb   uu    dd  5  2  10
5    bb   qq    ee  6  1  12

Apply more than one function, increases the number of columns

>>> def m10(x): return x-10  # minus
>>> def p10(x): return x+10  # plus
>>> df >> group_by_if('is_numeric', (m10, p10))
groups: ['x_m10', 'y_m10', 'z_m10', 'x_p10', 'y_p10', 'z_p10']
  alpha beta theta  x  y   z  x_m10  y_m10  z_m10  x_p10  y_p10  z_p10
0     a    b     c  1  6   7     -9     -4     -3     11     16     17
1     a    a     d  2  5   9     -8     -5     -1     12     15     19
2     a    b     e  3  4  11     -7     -6      1     13     14     21
3     b    r     c  4  3   8     -6     -7     -2     14     13     18
4     b    u     d  5  2  10     -5     -8      0     15     12     20
5     b    q     e  6  1  12     -4     -9      2     16     11     22