plydata.tidy.separate

class plydata.tidy.separate(*args, **kwargs)[source]

Split a single column into multiple columns

Parameters
colstr | int

Column name or position of variable to separate.

intolist-like

Column names. Use None to omit the variable from the output.

sepstr | regex | list-like

If String or regex, it is the pattern at which to separate the strings in the column. The default value separates on a string of non-alphanumeric characters.

If list-like it must contain positions to split at. The length of the list should be 1 less than into.

removebool

If True remove input column from output frame.

convertbool

If True convert result columns to int, float or bool where appropriate.

extra'warn' | 'drop' | 'merge'

Control what happens when there are too many pieces. Only applies if sep is a string/regex.

  • 'warn'(the default): warn and drop extra values.

  • 'drop': drop any extra values without a warning.

  • 'merge': only splits at most len(into) times.

fill'warn' | 'right' | 'left'

Control what happens when there are not enough pieces. Only applies if sep is a string/regex.

  • 'warn' (the default): warn and fill from the right

  • 'right': fill with missing values on the right

  • 'left': fill with missing values on the left

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...    'alpha': 1,
...    'x': ['a,1', 'b,2', 'c,3'],
...    'zeta': 6
... })
>>> df
   alpha    x  zeta
0      1  a,1     6
1      1  b,2     6
2      1  c,3     6
>>> df >> separate('x', into=['A', 'B'])
   alpha  A  B  zeta
0      1  a  1     6
1      1  b  2     6
2      1  c  3     6
>>> df >> separate('x', into=['A', 'B'], remove=False)
   alpha    x  A  B  zeta
0      1  a,1  a  1     6
1      1  b,2  b  2     6
2      1  c,3  c  3     6

Using an array of positions and using None to omit a variable.

>>> df >> separate('x', into=['A', None, 'C'], sep=(1, 2))
   alpha  A  C  zeta
0      1  a  1     6
1      1  b  2     6
2      1  c  3     6

Dealing with extra pieces

>>> df = pd.DataFrame({
...    'alpha': 1,
...    'x': ['a,1', 'b,2', 'c,3,d'],
...    'zeta': 6
... })
>>> df
   alpha      x  zeta
0      1    a,1     6
1      1    b,2     6
2      1  c,3,d     6
>>> df >> separate('x', into=['A', 'B'], extra='merge')
   alpha  A    B  zeta
0      1  a    1     6
1      1  b    2     6
2      1  c  3,d     6
>>> df >> separate('x', into=['A', 'B'], extra='drop')
   alpha  A  B  zeta
0      1  a  1     6
1      1  b  2     6
2      1  c  3     6

Dealing with fewer pieces

>>> df = pd.DataFrame({
...    'alpha': 1,
...    'x': ['a,1', 'b,2', 'c'],
...    'zeta': 6
... })
>>> df
   alpha    x  zeta
0      1  a,1     6
1      1  b,2     6
2      1    c     6
>>> df >> separate('x', into=['A', 'B'], fill='right')
   alpha  A     B  zeta
0      1  a     1     6
1      1  b     2     6
2      1  c  None     6
>>> df >> separate('x', into=['A', 'B'], fill='left')
   alpha     A  B  zeta
0      1     a  1     6
1      1     b  2     6
2      1  None  c     6

Missing values

>>> df = pd.DataFrame({
...    'alpha': 1,
...    'x': ['a,1', None, 'c,3'],
...    'zeta': 6
... })
>>> df
   alpha     x  zeta
0      1   a,1     6
1      1  None     6
2      1   c,3     6
>>> df >> separate('x', into=['A', 'B'])
   alpha     A     B  zeta
0      1     a     1     6
1      1  None  None     6
2      1     c     3     6

More than one character separators. Any spaces must be included in the separator

>>> df = pd.DataFrame({
...    'alpha': 1,
...    'x': ['a -> 1', 'b -> 2', 'c -> 3'],
...    'zeta': 6
... })
>>> df
   alpha       x  zeta
0      1  a -> 1     6
1      1  b -> 2     6
2      1  c -> 3     6
>>> df >> separate('x', into=['A', 'B'], sep=' -> ')
   alpha  A  B  zeta
0      1  a  1     6
1      1  b  2     6
2      1  c  3     6

All values of sep are treated ad regular expression, but a compiled regex can is also permitted.

>>> pattern = re.compile(r'\s*->\s*')
>>> df >> separate('x', into=['A', 'B'], sep=pattern)
   alpha  A  B  zeta
0      1  a  1     6
1      1  b  2     6
2      1  c  3     6