plydata.helper_verbs.create_at¶

class plydata.helper_verbs.create_at(*args, **kwargs)[source]¶

Create dataframe with specific columns

Parameters

datadataframe, optional

Useful when not using the >> operator.

namestuple or dict

Names of columns in dataframe. If a tuple, they should be names of columns. If a dict, they keys must be in.

startswithstr or tuple, optional
All column names that start with this string will be included.
endswithstr or tuple, optional
All column names that end with this string will be included.
containsstr or tuple, optional
All column names that contain with this string will be included.
matchesstr or regex or tuple, optional
All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
dropbool, optional
If True, the selection is inverted. The unspecified/unmatched columns are returned instead. Default is False.

functionscallable() or tuple or dict or str

Functions to alter the columns:

function (any callable) - Function is applied to the column and the result columns replace the original columns.

tuple of functions - Each function is applied to all of the columns and the name (__name__) of the function is postfixed to resulting column names.

dict of the form {'name': function} - Allows you to apply one or more functions and also control the postfix to the name.

str - String can be used for more complex statements, but the resulting names will be terrible.

argstuple

Arguments to the functions. The arguments are pass to all functions.

kwargsdict

Keyword arguments to the functions. The keyword arguments are passed to all functions.

Examples

>>> import pandas as pd
>>> import numpy as np
>>> from plydata import *
>>> df = pd.DataFrame({
...     'alpha': list('aaabbb'),
...     'beta': list('babruq'),
...     'theta': list('cdecde'),
...     'x': [1, 2, 3, 4, 5, 6],
...     'y': [6, 5, 4, 3, 2, 1],
...     'z': [7, 9, 11, 8, 10, 12]
... })

Create a new dataframe by doubling selected column values of the input frame.

>>> def double(s):
...     return s + s
>>> df >> create_at(('x', 'y', 'z'), double)
    x   y   z
0   2  12  14
1   4  10  18
2   6   8  22
3   8   6  16
4  10   4  20
5  12   2  24

Convert from centimetes to inches.

>>> def inch(col, decimals=0):
...     return np.round(col/2.54, decimals)
>>> def feet(col, decimals=0):
...     return np.round(col/30.48, decimals)
>>> df >> create_at(('x', 'y', 'z'), (inch, feet), decimals=2)
   x_inch  y_inch  z_inch  x_feet  y_feet  z_feet
0    0.39    2.36    2.76    0.03    0.20    0.23
1    0.79    1.97    3.54    0.07    0.16    0.30
2    1.18    1.57    4.33    0.10    0.13    0.36
3    1.57    1.18    3.15    0.13    0.10    0.26
4    1.97    0.79    3.94    0.16    0.07    0.33
5    2.36    0.39    4.72    0.20    0.03    0.39

Group columns are always included and if listed in the selection, the functions act on them.

>>> (df
...  >> group_by('x')
...  >> create_at(('x', 'y', 'z'), (inch, feet), decimals=2))
groups: ['x']
   x  x_inch  y_inch  z_inch  x_feet  y_feet  z_feet
0  1    0.39    2.36    2.76    0.03    0.20    0.23
1  2    0.79    1.97    3.54    0.07    0.16    0.30
2  3    1.18    1.57    4.33    0.10    0.13    0.36
3  4    1.57    1.18    3.15    0.13    0.10    0.26
4  5    1.97    0.79    3.94    0.16    0.07    0.33
5  6    2.36    0.39    4.72    0.20    0.03    0.39

Group columns that are not listed are not acted upon by the functions.

>>> (df
...  >> group_by('x')
...  >> create_at(dict(matches=r'x|y|z'), (inch, feet), decimals=2))
groups: ['x']
   x  y_inch  z_inch  y_feet  z_feet
0  1    2.36    2.76    0.20    0.23
1  2    1.97    3.54    0.16    0.30
2  3    1.57    4.33    0.13    0.36
3  4    1.18    3.15    0.10    0.26
4  5    0.79    3.94    0.07    0.33
5  6    0.39    4.72    0.03    0.39