plydata.tidy.extract¶
-
class
plydata.tidy.
extract
(*args, **kwargs)[source]¶ Split a column using a regular expression with capturing groups.
If the groups don't match, or the input is NA, the output will be NA.
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- col
str
|int
Column name or position of variable to separate.
- intolist-like
Column names. Use
None
to omit the variable from the output.- regex
str
|regex
Pattern used to extract columns from
col
. There should be only one group (defined by()
) for each element ofinto
.- removebool
If
True
remove input column from output frame.- convertbool
If
True
convert result columns to int, float or bool where appropriate.
- data
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', 'b,2', 'c,3'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 b,2 6 2 1 c,3 6 >>> df >> extract('x', into='A') alpha A zeta 0 1 a 6 1 1 b 6 2 1 c 6 >>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),(\w+)') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6
>>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),(\w+)', remove=False) alpha x A B zeta 0 1 a,1 a 1 6 1 1 b,2 b 2 6 2 1 c,3 c 3 6
Convert extracted columns to appropriate data types.
>>> result = df >> extract( ... 'x', into=['A', 'B'], regex=r'(\w+),(\w+)', convert=True) >>> result['B'].dtype dtype('int64')
The regex must match fully, not just the individual groups.
>>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),([12]+)') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 NaN NaN 6