plydata.helper_verbs.summarize_if¶

class plydata.helper_verbs.summarize_if(*args, **kwargs)[source]¶

Summarise all columns that are true for a predicate

Parameters

datadataframe, optional

Useful when not using the >> operator.

predicatefunction or str

A predicate function to be applied to the columns of the dataframe. Good candidates for predicate functions are those that check the type of the column. Such function are avaible at pandas.api.dtypes, for example pandas.api.types.is_numeric_dtype().

For convenience, you can reference the is_*_dtype functions with shorter strings:

'is_bool'             # pandas.api.types.is_bool_dtype
'is_categorical'      # pandas.api.types.is_categorical_dtype
'is_complex'          # pandas.api.types.is_complex_dtype
'is_datetime64_any'   # pandas.api.types.is_datetime64_any_dtype
'is_datetime64'       # pandas.api.types.is_datetime64_dtype
'is_datetime64_ns'    # pandas.api.types.is_datetime64_ns_dtype
'is_datetime64tz'     # pandas.api.types.is_datetime64tz_dtype
'is_float'            # pandas.api.types.is_float_dtype
'is_int64'            # pandas.api.types.is_int64_dtype
'is_integer'          # pandas.api.types.is_integer_dtype
'is_interval'         # pandas.api.types.is_interval_dtype
'is_numeric'          # pandas.api.types.is_numeric_dtype
'is_object'           # pandas.api.types.is_object_dtype
'is_period'           # pandas.api.types.is_period_dtype
'is_signed_integer'   # pandas.api.types.is_signed_integer_dtype
'is_string'           # pandas.api.types.is_string_dtype
'is_timedelta64'      # pandas.api.types.is_timedelta64_dtype
'is_timedelta64_ns'   # pandas.api.types.is_timedelta64_ns_dtype
'is_unsigned_integer' # pandas.api.types.is_unsigned_integer_dtype

No other string values are allowed.

functionsstr or tuple or dict, optional

Expressions or (name, expression) pairs. This should be used when the name is not a valid python variable name. The expression should be of type str or an interable with the same number of elements as the dataframe.

Examples

>>> import pandas as pd
>>> import pandas.api.types as pdtypes
>>> import numpy as np
>>> from plydata import *
>>> df = pd.DataFrame({
...     'alpha': list('aaabbb'),
...     'beta': list('babruq'),
...     'theta': list('cdecde'),
...     'x': [1, 2, 3, 4, 5, 6],
...     'y': [6, 5, 4, 3, 2, 1],
...     'z': [7, 9, 11, 8, 10, 12]
... })

Summarizing all numeric columns

>>> df >> summarize_if(pdtypes.is_numeric_dtype, (np.min, np.max))
   x_amin  y_amin  z_amin  x_amax  y_amax  z_amax
0       1       1       7       6       6      12

Group by

>>> (df
...  >> group_by('alpha')
...  >> summarize_if(pdtypes.is_numeric_dtype, (np.min, np.max))
... )
  alpha  x_amin  y_amin  z_amin  x_amax  y_amax  z_amax
0     a       1       4       7       3       6      11
1     b       4       1       8       6       3      12

Using a 'is_string' as a shortcut to pdtypes.is_string_dtype for the predicate and custom summarizing a function.

>>> def first(col): return list(col)[0]
>>> df >> group_by('alpha') >> summarize_if('is_string', first)
  alpha beta theta
0     a    b     c
1     b    r     c

Note, if the any of the group columns match the predictate, they are selected.