plydata.tidy.separate_rows

class plydata.tidy.separate_rows(*args, **kwargs)[source]

Separate values of a variable along multiple rows

Parameters
datadataframe, optional

Useful when not using the >> operator.

*colslist-like | select | str | slice

Columns to be gathered and whose contents will make values.

sepstr | regex

The pattern at which to separate the variable. The default value separates on a string of non-alphanumeric characters.

convertbool

If True convert result columns to int, float or bool

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'parent': ['martha', 'james', 'alice'],
...     'child': ['leah', 'joe,vinny,laura', 'pat,lee'],
...     'age': ['3', '12,6,4', '2,7']
... })
>>> df
   parent            child     age
0  martha             leah       3
1   james  joe,vinny,laura  12,6,4
2   alice          pat,lee     2,7
>>> df >> separate_rows('child', 'age')
   parent  child age
0  martha   leah   3
1   james    joe  12
2   james  vinny   6
3   james  laura   4
4   alice    pat   2
5   alice    lee   7

Column selection uses plydata.one_table_verbs.select, so you can do:

>>> df >> separate_rows('-parent')
   parent  child age
0  martha   leah   3
1   james    joe  12
2   james  vinny   6
3   james  laura   4
4   alice    pat   2
5   alice    lee   7

or

>>> df >> separate_rows(select(matches=r'^[ac]'))
   parent  child age
0  martha   leah   3
1   james    joe  12
2   james  vinny   6
3   james  laura   4
4   alice    pat   2
5   alice    lee   7

You can separate all columns by specifying any column. All columns should be separable.

>>> df[['child', 'age']] >> separate_rows()
   child age
0   leah   3
1    joe  12
2  vinny   6
3  laura   4
4    pat   2
5    lee   7