plydata.one_table_verbs.select¶

class plydata.one_table_verbs.select(*args, **kwargs)[source]¶

Select columns by name

Parameters

datadataframe, optional: Useful when not using the >> operator.
namestuple, optional: Names of columns in dataframe. Normally, they are strings can include slice e.g slice('col2', 'col5'). You can also exclude columns by prepending a - e.g py:select('-col1'), will include all columns minus than col1.
startswithstr or tuple, optional: All column names that start with this string will be included.
endswithstr or tuple, optional: All column names that end with this string will be included.
containsstr or tuple, optional: All column names that contain with this string will be included.
matchesstr or regex or tuple, optional: All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
dropbool, optional: If True, the selection is inverted. The unspecified/unmatched columns are returned instead. Default is False.

Notes

To exclude columns by prepending a minus, the first column passed to select must be prepended with minus. select('-a', 'c') will exclude column a, while select('c', '-a') will not exclude column a.

Examples

>>> import pandas as pd
>>> x = [1, 2, 3]
>>> df = pd.DataFrame({'bell': x, 'whistle': x, 'nail': x, 'tail': x})
>>> df >> select('bell', 'nail')
   bell  nail
0     1     1
1     2     2
2     3     3
>>> df >> select('bell', 'nail', drop=True)
   whistle  tail
0        1     1
1        2     2
2        3     3
>>> df >> select('whistle',  endswith='ail')
   whistle nail  tail
0        1    1     1
1        2    2     2
2        3    3     3
>>> df >> select('bell',  matches=r'\w+tle$')
   bell  whistle
0     1        1
1     2        2
2     3        3

You can select column slices too. Like loc(), the stop column is included.

>>> df = pd.DataFrame({'a': x, 'b': x, 'c': x, 'd': x,
...                    'e': x, 'f': x, 'g': x, 'h': x})
>>> df
   a  b  c  d  e  f  g  h
0  1  1  1  1  1  1  1  1
1  2  2  2  2  2  2  2  2
2  3  3  3  3  3  3  3  3
>>> df >> select('a', slice('c', 'e'), 'g')
   a  c  d  e  g
0  1  1  1  1  1
1  2  2  2  2  2
2  3  3  3  3  3

You can exclude columns by prepending -

>>> df >> select('-a', '-c', '-e')
   b  d  f  g  h
0  1  1  1  1  1
1  2  2  2  2  2
2  3  3  3  3  3

Remove and place column at the end

>>> df >> select('-a', '-c', '-e', 'a')
   b  d  f  g  h  a
0  1  1  1  1  1  1
1  2  2  2  2  2  2
2  3  3  3  3  3  3