plydata.one_table_verbs.select¶
-
class
plydata.one_table_verbs.
select
(*args, **kwargs)[source]¶ Select columns by name
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- names
tuple
, optional Names of columns in dataframe. Normally, they are strings can include slice e.g
slice('col2', 'col5')
. You can also exclude columns by prepending a-
e.g py:select('-col1'), will include all columns minus than col1.- startswith
str
ortuple
, optional All column names that start with this string will be included.
- endswith
str
ortuple
, optional All column names that end with this string will be included.
- contains
str
ortuple
, optional All column names that contain with this string will be included.
- matches
str
orregex
ortuple
, optional All column names that match the string or a compiled regex pattern will be included. A tuple can be used to match multiple regexs.
- dropbool, optional
If
True
, the selection is inverted. The unspecified/unmatched columns are returned instead. Default isFalse
.
- data
Notes
To exclude columns by prepending a minus, the first column passed to
select
must be prepended with minus.select('-a', 'c')
will exclude columna
, whileselect('c', '-a')
will not exclude columna
.Examples
>>> import pandas as pd >>> x = [1, 2, 3] >>> df = pd.DataFrame({'bell': x, 'whistle': x, 'nail': x, 'tail': x}) >>> df >> select('bell', 'nail') bell nail 0 1 1 1 2 2 2 3 3 >>> df >> select('bell', 'nail', drop=True) whistle tail 0 1 1 1 2 2 2 3 3 >>> df >> select('whistle', endswith='ail') whistle nail tail 0 1 1 1 1 2 2 2 2 3 3 3 >>> df >> select('bell', matches=r'\w+tle$') bell whistle 0 1 1 1 2 2 2 3 3
You can select column slices too. Like
loc()
, the stop column is included.>>> df = pd.DataFrame({'a': x, 'b': x, 'c': x, 'd': x, ... 'e': x, 'f': x, 'g': x, 'h': x}) >>> df a b c d e f g h 0 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 >>> df >> select('a', slice('c', 'e'), 'g') a c d e g 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3
You can exclude columns by prepending
-
>>> df >> select('-a', '-c', '-e') b d f g h 0 1 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3
Remove and place column at the end
>>> df >> select('-a', '-c', '-e', 'a') b d f g h a 0 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3