plydata.tidy.extract¶
-
class
plydata.tidy.extract(*args, **kwargs)[source]¶ Split a column using a regular expression with capturing groups.
If the groups don't match, or the input is NA, the output will be NA.
- Parameters
- data
dataframe, optional Useful when not using the
>>operator.- col
str|int Column name or position of variable to separate.
- intolist-like
Column names. Use
Noneto omit the variable from the output.- regex
str|regex Pattern used to extract columns from
col. There should be only one group (defined by()) for each element ofinto.- removebool
If
Trueremove input column from output frame.- convertbool
If
Trueconvert result columns to int, float or bool where appropriate.
- data
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', 'b,2', 'c,3'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 b,2 6 2 1 c,3 6 >>> df >> extract('x', into='A') alpha A zeta 0 1 a 6 1 1 b 6 2 1 c 6 >>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),(\w+)') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6
>>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),(\w+)', remove=False) alpha x A B zeta 0 1 a,1 a 1 6 1 1 b,2 b 2 6 2 1 c,3 c 3 6
Convert extracted columns to appropriate data types.
>>> result = df >> extract( ... 'x', into=['A', 'B'], regex=r'(\w+),(\w+)', convert=True) >>> result['B'].dtype dtype('int64')
The regex must match fully, not just the individual groups.
>>> df >> extract('x', into=['A', 'B'], regex=r'(\w+),([12]+)') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 NaN NaN 6