plydata.helper_verbs.summarize_if¶
-
class
plydata.helper_verbs.
summarize_if
(*args, **kwargs)[source]¶ Summarise all columns that are true for a predicate
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- predicate
function
orstr
A predicate function to be applied to the columns of the dataframe. Good candidates for predicate functions are those that check the type of the column. Such function are avaible at
pandas.api.dtypes
, for examplepandas.api.types.is_numeric_dtype()
.For convenience, you can reference the
is_*_dtype
functions with shorter strings:'is_bool' # pandas.api.types.is_bool_dtype 'is_categorical' # pandas.api.types.is_categorical_dtype 'is_complex' # pandas.api.types.is_complex_dtype 'is_datetime64_any' # pandas.api.types.is_datetime64_any_dtype 'is_datetime64' # pandas.api.types.is_datetime64_dtype 'is_datetime64_ns' # pandas.api.types.is_datetime64_ns_dtype 'is_datetime64tz' # pandas.api.types.is_datetime64tz_dtype 'is_float' # pandas.api.types.is_float_dtype 'is_int64' # pandas.api.types.is_int64_dtype 'is_integer' # pandas.api.types.is_integer_dtype 'is_interval' # pandas.api.types.is_interval_dtype 'is_numeric' # pandas.api.types.is_numeric_dtype 'is_object' # pandas.api.types.is_object_dtype 'is_period' # pandas.api.types.is_period_dtype 'is_signed_integer' # pandas.api.types.is_signed_integer_dtype 'is_string' # pandas.api.types.is_string_dtype 'is_timedelta64' # pandas.api.types.is_timedelta64_dtype 'is_timedelta64_ns' # pandas.api.types.is_timedelta64_ns_dtype 'is_unsigned_integer' # pandas.api.types.is_unsigned_integer_dtype
No other string values are allowed.
- functions
str
ortuple
ordict
, optional Expressions or
(name, expression)
pairs. This should be used when the name is not a valid python variable name. The expression should be of typestr
or an interable with the same number of elements as the dataframe.
- data
Examples
>>> import pandas as pd >>> import pandas.api.types as pdtypes >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
Summarizing all numeric columns
>>> df >> summarize_if(pdtypes.is_numeric_dtype, (np.min, np.max)) x_amin y_amin z_amin x_amax y_amax z_amax 0 1 1 7 6 6 12
Group by
>>> (df ... >> group_by('alpha') ... >> summarize_if(pdtypes.is_numeric_dtype, (np.min, np.max)) ... ) alpha x_amin y_amin z_amin x_amax y_amax z_amax 0 a 1 4 7 3 6 11 1 b 4 1 8 6 3 12
Using a
'is_string'
as a shortcut topdtypes.is_string_dtype
for the predicate and custom summarizing a function.>>> def first(col): return list(col)[0] >>> df >> group_by('alpha') >> summarize_if('is_string', first) alpha beta theta 0 a b c 1 b r c
Note, if the any of the group columns match the predictate, they are selected.