plydata.one_table_verbs.group_by¶

class plydata.one_table_verbs.group_by(*args, **kwargs)[source]¶

Group dataframe by one or more columns/variables

Parameters

datadataframe, optional: Useful when not using the >> operator.
argsstrs, tuples, optional: Expressions or (name, expression) pairs. This should be used when the name is not a valid python variable name. The expression should be of type str or an interable with the same number of elements as the dataframe.
add_bool, optional: If True, add to existing groups. Default is to create new groups.
kwargsdict, optional: {name: expression} pairs.

Notes

If plydata.options.modify_input_data is True, group_by will modify the original dataframe.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [1, 5, 2, 2, 4, 0, 4],
...                    'y': [1, 2, 3, 4, 5, 6, 5]})
>>> df >> group_by('x')
groups: ['x']
   x  y
0  1  1
1  5  2
2  2  3
3  2  4
4  4  5
5  0  6
6  4  5

Like define(), group_by() creates any missing columns.

>>> df >> group_by('y-1', xplus1='x+1')
groups: ['y-1', 'xplus1']
   x  y  y-1  xplus1
1  1    0       2
5  2    1       6
2  3    2       3
2  4    3       3
4  5    4       5
0  6    5       1
4  5    4       5

Columns that are grouped on remain in the dataframe after any verb operations that do not use the group information. For example:

>>> df >> group_by('y-1', xplus1='x+1') >> select('y')
groups: ['y-1', 'xplus1']
   y-1  xplus1  y
  0       2  1
  1       6  2
  2       3  3
  3       3  4
  4       5  5
  5       1  6
  4       5  5