plydata.one_table_verbs.group_by

class plydata.one_table_verbs.group_by(*args, **kwargs)[source]

Group dataframe by one or more columns/variables

Parameters
datadataframe, optional

Useful when not using the >> operator.

argsstrs, tuples, optional

Expressions or (name, expression) pairs. This should be used when the name is not a valid python variable name. The expression should be of type str or an interable with the same number of elements as the dataframe.

add_bool, optional

If True, add to existing groups. Default is to create new groups.

kwargsdict, optional

{name: expression} pairs.

Notes

If plydata.options.modify_input_data is True, group_by will modify the original dataframe.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [1, 5, 2, 2, 4, 0, 4],
...                    'y': [1, 2, 3, 4, 5, 6, 5]})
>>> df >> group_by('x')
groups: ['x']
   x  y
0  1  1
1  5  2
2  2  3
3  2  4
4  4  5
5  0  6
6  4  5

Like define(), group_by() creates any missing columns.

>>> df >> group_by('y-1', xplus1='x+1')
groups: ['y-1', 'xplus1']
   x  y  y-1  xplus1
0  1  1    0       2
1  5  2    1       6
2  2  3    2       3
3  2  4    3       3
4  4  5    4       5
5  0  6    5       1
6  4  5    4       5

Columns that are grouped on remain in the dataframe after any verb operations that do not use the group information. For example:

>>> df >> group_by('y-1', xplus1='x+1') >> select('y')
groups: ['y-1', 'xplus1']
   y-1  xplus1  y
0    0       2  1
1    1       6  2
2    2       3  3
3    3       3  4
4    4       5  5
5    5       1  6
6    4       5  5