plydata.helper_verbs.group_by_at¶
-
class
plydata.helper_verbs.
group_by_at
(*args, **kwargs)[source]¶ Group by select columns
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- names
tuple
ordict
Names of columns in dataframe. If a tuple, they should be names of columns. If a
dict
, they keys must be in.- startswithstr or tuple, optional
All column names that start with this string will be included.
- endswithstr or tuple, optional
All column names that end with this string will be included.
- containsstr or tuple, optional
All column names that contain with this string will be included.
- matchesstr or regex or tuple, optional
All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
- dropbool, optional
If
True
, the selection is inverted. The unspecified/unmatched columns are returned instead. Default isFalse
.
- functions
callable()
ortuple
ordict
orstr
, optional Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tuple
of functions - Each function is applied to all of the columns and the name (__name__
) of the function is postfixed to resulting column names.dict
of the form{'name': function}
- Allows you to apply one or more functions and also control the postfix to the name.str
- String can be used for more complex statements, but the resulting names will be terrible.
- args
tuple
Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict
Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
In the simplest form it is not too different from
group_by
.>>> df >> group_by_at(('x', 'y')) groups: ['x', 'y'] alpha beta theta x y z 0 a b c 1 6 7 1 a a d 2 5 9 2 a b e 3 4 11 3 b r c 4 3 8 4 b u d 5 2 10 5 b q e 6 1 12
The power comes from the ability to do dynamic column selection. For example, regex match column names and apply function to get the group columns.
>>> def double(s): return s + s >>> df >> group_by_at(dict(matches=r'\w+eta$'), double) groups: ['beta', 'theta'] alpha beta theta x y z 0 a bb cc 1 6 7 1 a aa dd 2 5 9 2 a bb ee 3 4 11 3 b rr cc 4 3 8 4 b uu dd 5 2 10 5 b qq ee 6 1 12