plydata.helper_verbs.mutate_at¶
-
class
plydata.helper_verbs.
mutate_at
(*args, **kwargs)[source]¶ Change selected columns
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- names
tuple
ordict
Names of columns in dataframe. If a tuple, they should be names of columns. If a
dict
, they keys must be in.- startswithstr or tuple, optional
All column names that start with this string will be included.
- endswithstr or tuple, optional
All column names that end with this string will be included.
- containsstr or tuple, optional
All column names that contain with this string will be included.
- matchesstr or regex or tuple, optional
All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
- dropbool, optional
If
True
, the selection is inverted. The unspecified/unmatched columns are returned instead. Default isFalse
.
- functions
callable()
ortuple
ordict
orstr
Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tuple
of functions - Each function is applied to all of the columns and the name (__name__
) of the function is postfixed to resulting column names.dict
of the form{'name': function}
- Allows you to apply one or more functions and also control the postfix to the name.str
- String can be used for more complex statements, but the resulting names will be terrible.
- args
tuple
Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict
Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
A single function with an argument
>>> df >> mutate_at(('x', 'y', 'z'), np.add, 10) alpha beta theta x y z 0 a b c 11 16 17 1 a a d 12 15 19 2 a b e 13 14 21 3 b r c 14 13 18 4 b u d 15 12 20 5 b q e 16 11 22
A two functions that accept the same argument
>>> df >> mutate_at(('x', 'z'), (np.add, np.subtract), 10) alpha beta theta x y z x_add z_add x_subtract z_subtract 0 a b c 1 6 7 11 17 -9 -3 1 a a d 2 5 9 12 19 -8 -1 2 a b e 3 4 11 13 21 -7 1 3 b r c 4 3 8 14 18 -6 -2 4 b u d 5 2 10 15 20 -5 0 5 b q e 6 1 12 16 22 -4 2
Convert x, y and z from centimeters to inches and round the 2 decimal places.
>>> (df ... >> mutate_at(('x', 'y', 'z'), ... dict(inch=lambda col: np.round(col/2.54, 2))) ... ) alpha beta theta x y z x_inch y_inch z_inch 0 a b c 1 6 7 0.39 2.36 2.76 1 a a d 2 5 9 0.79 1.97 3.54 2 a b e 3 4 11 1.18 1.57 4.33 3 b r c 4 3 8 1.57 1.18 3.15 4 b u d 5 2 10 1.97 0.79 3.94 5 b q e 6 1 12 2.36 0.39 4.72
Groupwise standardization of multiple variables.
>>> def scale(col): return (col - np.mean(col))/np.std(col) >>> (df ... >> group_by('alpha') ... >> mutate_at(('x', 'y', 'z'), scale)) groups: ['alpha'] alpha beta theta x y z 0 a b c -1.224745 1.224745 -1.224745 1 a a d 0.000000 0.000000 0.000000 2 a b e 1.224745 -1.224745 1.224745 3 b r c -1.224745 1.224745 -1.224745 4 b u d 0.000000 0.000000 0.000000 5 b q e 1.224745 -1.224745 1.224745