plydata.tidy.separate¶
-
class
plydata.tidy.
separate
(*args, **kwargs)[source]¶ Split a single column into multiple columns
- Parameters
- col
str
|int
Column name or position of variable to separate.
- intolist-like
Column names. Use
None
to omit the variable from the output.- sep
str
|regex
| list-like If String or regex, it is the pattern at which to separate the strings in the column. The default value separates on a string of non-alphanumeric characters.
If list-like it must contain positions to split at. The length of the list should be 1 less than
into
.- removebool
If
True
remove input column from output frame.- convertbool
If
True
convert result columns to int, float or bool where appropriate.- extra'warn' | 'drop' | 'merge'
Control what happens when there are too many pieces. Only applies if
sep
is a string/regex.'warn'(the default): warn and drop extra values.
'drop': drop any extra values without a warning.
'merge': only splits at most
len(into)
times.
- fill'warn' | 'right' | 'left'
Control what happens when there are not enough pieces. Only applies if
sep
is a string/regex.'warn' (the default): warn and fill from the right
'right': fill with missing values on the right
'left': fill with missing values on the left
- col
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', 'b,2', 'c,3'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 b,2 6 2 1 c,3 6 >>> df >> separate('x', into=['A', 'B']) alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6 >>> df >> separate('x', into=['A', 'B'], remove=False) alpha x A B zeta 0 1 a,1 a 1 6 1 1 b,2 b 2 6 2 1 c,3 c 3 6
Using an array of positions and using
None
to omit a variable.>>> df >> separate('x', into=['A', None, 'C'], sep=(1, 2)) alpha A C zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6
Dealing with extra pieces
>>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', 'b,2', 'c,3,d'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 b,2 6 2 1 c,3,d 6 >>> df >> separate('x', into=['A', 'B'], extra='merge') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3,d 6 >>> df >> separate('x', into=['A', 'B'], extra='drop') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6
Dealing with fewer pieces
>>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', 'b,2', 'c'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 b,2 6 2 1 c 6 >>> df >> separate('x', into=['A', 'B'], fill='right') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c None 6 >>> df >> separate('x', into=['A', 'B'], fill='left') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 None c 6
Missing values
>>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a,1', None, 'c,3'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a,1 6 1 1 None 6 2 1 c,3 6 >>> df >> separate('x', into=['A', 'B']) alpha A B zeta 0 1 a 1 6 1 1 None None 6 2 1 c 3 6
More than one character separators. Any spaces must be included in the separator
>>> df = pd.DataFrame({ ... 'alpha': 1, ... 'x': ['a -> 1', 'b -> 2', 'c -> 3'], ... 'zeta': 6 ... }) >>> df alpha x zeta 0 1 a -> 1 6 1 1 b -> 2 6 2 1 c -> 3 6 >>> df >> separate('x', into=['A', 'B'], sep=' -> ') alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6
All values of
sep
are treated ad regular expression, but a compiled regex can is also permitted.>>> pattern = re.compile(r'\s*->\s*') >>> df >> separate('x', into=['A', 'B'], sep=pattern) alpha A B zeta 0 1 a 1 6 1 1 b 2 6 2 1 c 3 6