plydata.helper_verbs.create_at¶
-
class
plydata.helper_verbs.
create_at
(*args, **kwargs)[source]¶ Create dataframe with specific columns
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- names
tuple
ordict
Names of columns in dataframe. If a tuple, they should be names of columns. If a
dict
, they keys must be in.- startswithstr or tuple, optional
All column names that start with this string will be included.
- endswithstr or tuple, optional
All column names that end with this string will be included.
- containsstr or tuple, optional
All column names that contain with this string will be included.
- matchesstr or regex or tuple, optional
All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
- dropbool, optional
If
True
, the selection is inverted. The unspecified/unmatched columns are returned instead. Default isFalse
.
- functions
callable()
ortuple
ordict
orstr
Functions to alter the columns:
function (any callable) - Function is applied to the column and the result columns replace the original columns.
tuple
of functions - Each function is applied to all of the columns and the name (__name__
) of the function is postfixed to resulting column names.dict
of the form{'name': function}
- Allows you to apply one or more functions and also control the postfix to the name.str
- String can be used for more complex statements, but the resulting names will be terrible.
- args
tuple
Arguments to the functions. The arguments are pass to all functions.
- kwargs
dict
Keyword arguments to the functions. The keyword arguments are passed to all functions.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
Create a new dataframe by doubling selected column values of the input frame.
>>> def double(s): ... return s + s >>> df >> create_at(('x', 'y', 'z'), double) x y z 0 2 12 14 1 4 10 18 2 6 8 22 3 8 6 16 4 10 4 20 5 12 2 24
Convert from centimetes to inches.
>>> def inch(col, decimals=0): ... return np.round(col/2.54, decimals) >>> def feet(col, decimals=0): ... return np.round(col/30.48, decimals) >>> df >> create_at(('x', 'y', 'z'), (inch, feet), decimals=2) x_inch y_inch z_inch x_feet y_feet z_feet 0 0.39 2.36 2.76 0.03 0.20 0.23 1 0.79 1.97 3.54 0.07 0.16 0.30 2 1.18 1.57 4.33 0.10 0.13 0.36 3 1.57 1.18 3.15 0.13 0.10 0.26 4 1.97 0.79 3.94 0.16 0.07 0.33 5 2.36 0.39 4.72 0.20 0.03 0.39
Group columns are always included and if listed in the selection, the functions act on them.
>>> (df ... >> group_by('x') ... >> create_at(('x', 'y', 'z'), (inch, feet), decimals=2)) groups: ['x'] x x_inch y_inch z_inch x_feet y_feet z_feet 0 1 0.39 2.36 2.76 0.03 0.20 0.23 1 2 0.79 1.97 3.54 0.07 0.16 0.30 2 3 1.18 1.57 4.33 0.10 0.13 0.36 3 4 1.57 1.18 3.15 0.13 0.10 0.26 4 5 1.97 0.79 3.94 0.16 0.07 0.33 5 6 2.36 0.39 4.72 0.20 0.03 0.39
Group columns that are not listed are not acted upon by the functions.
>>> (df ... >> group_by('x') ... >> create_at(dict(matches=r'x|y|z'), (inch, feet), decimals=2)) groups: ['x'] x y_inch z_inch y_feet z_feet 0 1 2.36 2.76 0.20 0.23 1 2 1.97 3.54 0.16 0.30 2 3 1.57 4.33 0.13 0.36 3 4 1.18 3.15 0.10 0.26 4 5 0.79 3.94 0.07 0.33 5 6 0.39 4.72 0.03 0.39