plydata.one_table_verbs.select

class plydata.one_table_verbs.select(*args, **kwargs)[source]

Select columns by name

Parameters
datadataframe, optional

Useful when not using the >> operator.

namestuple, optional

Names of columns in dataframe. Normally, they are strings can include slice e.g slice('col2', 'col5'). You can also exclude columns by prepending a - e.g py:select('-col1'), will include all columns minus than col1.

startswithstr or tuple, optional

All column names that start with this string will be included.

endswithstr or tuple, optional

All column names that end with this string will be included.

containsstr or tuple, optional

All column names that contain with this string will be included.

matchesstr or regex or tuple, optional

All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.

dropbool, optional

If True, the selection is inverted. The unspecified/unmatched columns are returned instead. Default is False.

Notes

To exclude columns by prepending a minus, the first column passed to select must be prepended with minus. select('-a', 'c') will exclude column a, while select('c', '-a') will not exclude column a.

Examples

>>> import pandas as pd
>>> x = [1, 2, 3]
>>> df = pd.DataFrame({'bell': x, 'whistle': x, 'nail': x, 'tail': x})
>>> df >> select('bell', 'nail')
   bell  nail
0     1     1
1     2     2
2     3     3
>>> df >> select('bell', 'nail', drop=True)
   whistle  tail
0        1     1
1        2     2
2        3     3
>>> df >> select('whistle',  endswith='ail')
   whistle nail  tail
0        1    1     1
1        2    2     2
2        3    3     3
>>> df >> select('bell',  matches=r'\w+tle$')
   bell  whistle
0     1        1
1     2        2
2     3        3

You can select column slices too. Like loc(), the stop column is included.

>>> df = pd.DataFrame({'a': x, 'b': x, 'c': x, 'd': x,
...                    'e': x, 'f': x, 'g': x, 'h': x})
>>> df
   a  b  c  d  e  f  g  h
0  1  1  1  1  1  1  1  1
1  2  2  2  2  2  2  2  2
2  3  3  3  3  3  3  3  3
>>> df >> select('a', slice('c', 'e'), 'g')
   a  c  d  e  g
0  1  1  1  1  1
1  2  2  2  2  2
2  3  3  3  3  3

You can exclude columns by prepending -

>>> df >> select('-a', '-c', '-e')
   b  d  f  g  h
0  1  1  1  1  1
1  2  2  2  2  2
2  3  3  3  3  3

Remove and place column at the end

>>> df >> select('-a', '-c', '-e', 'a')
   b  d  f  g  h  a
0  1  1  1  1  1  1
1  2  2  2  2  2  2
2  3  3  3  3  3  3