plydata.helper_verbs.mutate_if¶
-
class
plydata.helper_verbs.
mutate_if
(*args, **kwargs)[source]¶ Modify selected columns that are true for a predicate
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- predicate
function
A predicate function to be applied to the columns of the dataframe. Good candidates for predicate functions are those that check the type of the column. Such function are avaible at
pandas.api.dtypes
, for examplepandas.api.types.is_numeric_dtype()
.For convenience, you can reference the
is_*_dtype
functions with shorter strings:'is_bool' # pandas.api.types.is_bool_dtype 'is_categorical' # pandas.api.types.is_categorical_dtype 'is_complex' # pandas.api.types.is_complex_dtype 'is_datetime64_any' # pandas.api.types.is_datetime64_any_dtype 'is_datetime64' # pandas.api.types.is_datetime64_dtype 'is_datetime64_ns' # pandas.api.types.is_datetime64_ns_dtype 'is_datetime64tz' # pandas.api.types.is_datetime64tz_dtype 'is_float' # pandas.api.types.is_float_dtype 'is_int64' # pandas.api.types.is_int64_dtype 'is_integer' # pandas.api.types.is_integer_dtype 'is_interval' # pandas.api.types.is_interval_dtype 'is_numeric' # pandas.api.types.is_numeric_dtype 'is_object' # pandas.api.types.is_object_dtype 'is_period' # pandas.api.types.is_period_dtype 'is_signed_integer' # pandas.api.types.is_signed_integer_dtype 'is_string' # pandas.api.types.is_string_dtype 'is_timedelta64' # pandas.api.types.is_timedelta64_dtype 'is_timedelta64_ns' # pandas.api.types.is_timedelta64_ns_dtype 'is_unsigned_integer' # pandas.api.types.is_unsigned_integer_dtype
No other string values are allowed.
- functions
callable()
ortuple
ordict
orstr
Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tuple
of functions - Each function is applied to all of the columns and the name (__name__
) of the function is postfixed to resulting column names.dict
of the form{'name': function}
- Allows you to apply one or more functions and also control the postfix to the name.str
- String can be used for more complex statements, but the resulting names will be terrible.
- args
tuple
Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict
Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import pandas.api.types as pdtypes >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
A single function with an argument
>>> df >> mutate_if(pdtypes.is_numeric_dtype, np.add, 10) alpha beta theta x y z 0 a b c 11 16 17 1 a a d 12 15 19 2 a b e 13 14 21 3 b r c 14 13 18 4 b u d 15 12 20 5 b q e 16 11 22
A two functions that accept the same argument and using our crude column selector.
>>> def is_x_or_z(col): return col.name in ('x', 'z') >>> df >> mutate_if(is_x_or_z, (np.add, np.subtract), 10) alpha beta theta x y z x_add z_add x_subtract z_subtract 0 a b c 1 6 7 11 17 -9 -3 1 a a d 2 5 9 12 19 -8 -1 2 a b e 3 4 11 13 21 -7 1 3 b r c 4 3 8 14 18 -6 -2 4 b u d 5 2 10 15 20 -5 0 5 b q e 6 1 12 16 22 -4 2
Convert x, y and z from centimeters to inches and round the 2 decimal places.
>>> (df ... >> mutate_if('is_numeric', ... dict(inch=lambda col: np.round(col/2.54, 2)))) alpha beta theta x y z x_inch y_inch z_inch 0 a b c 1 6 7 0.39 2.36 2.76 1 a a d 2 5 9 0.79 1.97 3.54 2 a b e 3 4 11 1.18 1.57 4.33 3 b r c 4 3 8 1.57 1.18 3.15 4 b u d 5 2 10 1.97 0.79 3.94 5 b q e 6 1 12 2.36 0.39 4.72
Groupwise standardization of multiple variables.
>>> def scale(col): return (col - np.mean(col))/np.std(col) >>> (df ... >> group_by('alpha') ... >> mutate_if('is_numeric', scale)) groups: ['alpha'] alpha beta theta x y z 0 a b c -1.224745 1.224745 -1.224745 1 a a d 0.000000 0.000000 0.000000 2 a b e 1.224745 -1.224745 1.224745 3 b r c -1.224745 1.224745 -1.224745 4 b u d 0.000000 0.000000 0.000000 5 b q e 1.224745 -1.224745 1.224745
Using a boolean array to select the columns.
>>> df >> mutate_if( ... [False, False, False, True, True, True], ... np.negative) alpha beta theta x y z 0 a b c -1 -6 -7 1 a a d -2 -5 -9 2 a b e -3 -4 -11 3 b r c -4 -3 -8 4 b u d -5 -2 -10 5 b q e -6 -1 -12