plydata.one_table_verbs.group_by¶
-
class
plydata.one_table_verbs.
group_by
(*args, **kwargs)[source]¶ Group dataframe by one or more columns/variables
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- args
strs
,tuples
, optional Expressions or
(name, expression)
pairs. This should be used when the name is not a valid python variable name. The expression should be of typestr
or an interable with the same number of elements as the dataframe.- add_bool, optional
If True, add to existing groups. Default is to create new groups.
- kwargs
dict
, optional {name: expression}
pairs.
- data
Notes
If
plydata.options.modify_input_data
isTrue
,group_by
will modify the original dataframe.Examples
>>> import pandas as pd >>> df = pd.DataFrame({'x': [1, 5, 2, 2, 4, 0, 4], ... 'y': [1, 2, 3, 4, 5, 6, 5]}) >>> df >> group_by('x') groups: ['x'] x y 0 1 1 1 5 2 2 2 3 3 2 4 4 4 5 5 0 6 6 4 5
Like
define()
,group_by()
creates any missing columns.>>> df >> group_by('y-1', xplus1='x+1') groups: ['y-1', 'xplus1'] x y y-1 xplus1 0 1 1 0 2 1 5 2 1 6 2 2 3 2 3 3 2 4 3 3 4 4 5 4 5 5 0 6 5 1 6 4 5 4 5
Columns that are grouped on remain in the dataframe after any verb operations that do not use the group information. For example:
>>> df >> group_by('y-1', xplus1='x+1') >> select('y') groups: ['y-1', 'xplus1'] y-1 xplus1 y 0 0 2 1 1 1 6 2 2 2 3 3 3 3 3 4 4 4 5 5 5 5 1 6 6 4 5 5