plydata.tidy.pivot_longer

class plydata.tidy.pivot_longer(*args, **kwargs)[source]

Lengthen dataframe by reducing the columns & turning them into into values

Parameters
datadataframe, optional

Useful when not using the >> operator.

collist-like | select | str | slice

Columns to pivot into longer format.

names_tostr | list

Name of column to create. If a list, it is the names of columns to create if names_sep or names_pattern is given.

values_tostr | list-like

Column(s) where to get observation values that will be placed in the wide columns.

names_prefixstr | dict

A regular expression string used to remove matching text from the start of each variable name. If each column requires a different prefix, use a dict of the form {'column_name': r'regex'}. The column_name is as given in names_to.

names_sepstr

If names_to contains multiple values, these control how the column name is broken up.

names_patternregex

How the column name is broken up. Should be a regular expression containing matching groups.

values_drop_nabool

If True, drop rows that contain only NA``s in the ``values_to column. This effectively converts explicit missing values to implicit missing values, and should be used only when missing values in the data were created by its structure.

convertbool

If True convert separated or extracted columns to int, float or bool where appropriate. This only applies when names_to has multiple values and names_sep or names_pattern is set. Default is False.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({
...     'name': ['mary', 'mary', 'john', 'john'],
...     'city':['dakar', 'dakar', 'lome', 'lome'],
...     'year': [1990, 1992, 1996, 1998],
...     'data_t1_sunny': [8, 6, 4, 7],
...     'data_t2_rainy': [9, 7, 7, 6]
... })
>>> df
   name   city  year  data_t1_sunny  data_t2_rainy
0  mary  dakar  1990              8              9
1  mary  dakar  1992              6              7
2  john  lome   1996              4              7
3  john  lome   1998              7              6
>>> df >> pivot_longer(
...     cols=select(startswith='data'),
...     names_to='data_description',
...     values_to='score',
... )
   name   city  year data_description  score
0  mary  dakar  1990    data_t1_sunny      8
1  mary  dakar  1992    data_t1_sunny      6
2  john  lome   1996    data_t1_sunny      4
3  john  lome   1998    data_t1_sunny      7
4  mary  dakar  1990    data_t2_rainy      9
5  mary  dakar  1992    data_t2_rainy      7
6  john  lome   1996    data_t2_rainy      7
7  john  lome   1998    data_t2_rainy      6
>>> df >> pivot_longer(
...     cols=select(startswith='data'),
...     names_to='data_description',
...     values_to='score',
...     names_prefix='data_'
... )
   name   city  year data_description  score
0  mary  dakar  1990         t1_sunny      8
1  mary  dakar  1992         t1_sunny      6
2  john  lome   1996         t1_sunny      4
3  john  lome   1998         t1_sunny      7
4  mary  dakar  1990         t2_rainy      9
5  mary  dakar  1992         t2_rainy      7
6  john  lome   1996         t2_rainy      7
7  john  lome   1998         t2_rainy      6

The column names are hiding data (data_description). We can extract it.

>>> df >> pivot_longer(
...     cols=select(startswith='data'),
...     names_to=['take', 'season'],
...     values_to='score',
...     names_pattern=r'data_t(\d)_(\w+)'
... )
   name   city  year take season  score
0  mary  dakar  1990    1  sunny      8
1  mary  dakar  1992    1  sunny      6
2  john  lome   1996    1  sunny      4
3  john  lome   1998    1  sunny      7
4  mary  dakar  1990    2  rainy      9
5  mary  dakar  1992    2  rainy      7
6  john  lome   1996    2  rainy      7
7  john  lome   1998    2  rainy      6

Using a dictionary to specify different prefixes for more than than one column.

>>> df >> pivot_longer(
...     cols=select(startswith='data'),
...     names_to=['take', 'season'],
...     values_to='score',
...     names_pattern=r'data_(t\d)(_\w+)',
...     names_prefix={'take': 't', 'season': '_'}
... )
   name   city  year take season  score
0  mary  dakar  1990    1  sunny      8
1  mary  dakar  1992    1  sunny      6
2  john  lome   1996    1  sunny      4
3  john  lome   1998    1  sunny      7
4  mary  dakar  1990    2  rainy      9
5  mary  dakar  1992    2  rainy      7
6  john  lome   1996    2  rainy      7
7  john  lome   1998    2  rainy      6