plydata.helper_verbs.group_by_at¶
-
class
plydata.helper_verbs.group_by_at(*args, **kwargs)[source]¶ Group by select columns
- Parameters
- data
dataframe, optional Useful when not using the
>>operator.- names
tupleordict Names of columns in dataframe. If a tuple, they should be names of columns. If a
dict, they keys must be in.- startswithstr or tuple, optional
All column names that start with this string will be included.
- endswithstr or tuple, optional
All column names that end with this string will be included.
- containsstr or tuple, optional
All column names that contain with this string will be included.
- matchesstr or regex or tuple, optional
All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
- dropbool, optional
If
True, the selection is inverted. The unspecified/unmatched columns are returned instead. Default isFalse.
- functions
callable()ortupleordictorstr, optional Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tupleof functions - Each function is applied to all of the columns and the name (__name__) of the function is postfixed to resulting column names.dictof the form{'name': function}- Allows you to apply one or more functions and also control the postfix to the name.str- String can be used for more complex statements, but the resulting names will be terrible.
- args
tuple Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
In the simplest form it is not too different from
group_by.>>> df >> group_by_at(('x', 'y')) groups: ['x', 'y'] alpha beta theta x y z 0 a b c 1 6 7 1 a a d 2 5 9 2 a b e 3 4 11 3 b r c 4 3 8 4 b u d 5 2 10 5 b q e 6 1 12
The power comes from the ability to do dynamic column selection. For example, regex match column names and apply function to get the group columns.
>>> def double(s): return s + s >>> df >> group_by_at(dict(matches=r'\w+eta$'), double) groups: ['beta', 'theta'] alpha beta theta x y z 0 a bb cc 1 6 7 1 a aa dd 2 5 9 2 a bb ee 3 4 11 3 b rr cc 4 3 8 4 b uu dd 5 2 10 5 b qq ee 6 1 12