plydata.tidy.pivot_longer¶
-
class
plydata.tidy.
pivot_longer
(*args, **kwargs)[source]¶ Lengthen dataframe by reducing the columns & turning them into into values
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- collist-like |
select
|str
|slice
Columns to pivot into longer format.
- names_to
str
|list
Name of column to create. If a list, it is the names of columns to create if
names_sep
ornames_pattern
is given.- values_to
str
| list-like Column(s) where to get observation values that will be placed in the wide columns.
- names_prefix
str
|dict
A regular expression string used to remove matching text from the start of each variable name. If each column requires a different prefix, use a
dict
of the form{'column_name': r'regex'}
. Thecolumn_name
is as given innames_to
.- names_sep
str
If
names_to
contains multiple values, these control how the column name is broken up.- names_pattern
regex
How the column name is broken up. Should be a regular expression containing matching groups.
- values_drop_nabool
If
True
, drop rows that contain onlyNA``s in the ``values_to
column. This effectively converts explicit missing values to implicit missing values, and should be used only when missing values in thedata
were created by its structure.- convertbool
If
True
convert separated or extracted columns to int, float or bool where appropriate. This only applies whennames_to
has multiple values andnames_sep
ornames_pattern
is set. Default isFalse
.
- data
Examples
>>> import pandas as pd >>> df = pd.DataFrame({ ... 'name': ['mary', 'mary', 'john', 'john'], ... 'city':['dakar', 'dakar', 'lome', 'lome'], ... 'year': [1990, 1992, 1996, 1998], ... 'data_t1_sunny': [8, 6, 4, 7], ... 'data_t2_rainy': [9, 7, 7, 6] ... }) >>> df name city year data_t1_sunny data_t2_rainy 0 mary dakar 1990 8 9 1 mary dakar 1992 6 7 2 john lome 1996 4 7 3 john lome 1998 7 6 >>> df >> pivot_longer( ... cols=select(startswith='data'), ... names_to='data_description', ... values_to='score', ... ) name city year data_description score 0 mary dakar 1990 data_t1_sunny 8 1 mary dakar 1992 data_t1_sunny 6 2 john lome 1996 data_t1_sunny 4 3 john lome 1998 data_t1_sunny 7 4 mary dakar 1990 data_t2_rainy 9 5 mary dakar 1992 data_t2_rainy 7 6 john lome 1996 data_t2_rainy 7 7 john lome 1998 data_t2_rainy 6 >>> df >> pivot_longer( ... cols=select(startswith='data'), ... names_to='data_description', ... values_to='score', ... names_prefix='data_' ... ) name city year data_description score 0 mary dakar 1990 t1_sunny 8 1 mary dakar 1992 t1_sunny 6 2 john lome 1996 t1_sunny 4 3 john lome 1998 t1_sunny 7 4 mary dakar 1990 t2_rainy 9 5 mary dakar 1992 t2_rainy 7 6 john lome 1996 t2_rainy 7 7 john lome 1998 t2_rainy 6
The column names are hiding data (data_description). We can extract it.
>>> df >> pivot_longer( ... cols=select(startswith='data'), ... names_to=['take', 'season'], ... values_to='score', ... names_pattern=r'data_t(\d)_(\w+)' ... ) name city year take season score 0 mary dakar 1990 1 sunny 8 1 mary dakar 1992 1 sunny 6 2 john lome 1996 1 sunny 4 3 john lome 1998 1 sunny 7 4 mary dakar 1990 2 rainy 9 5 mary dakar 1992 2 rainy 7 6 john lome 1996 2 rainy 7 7 john lome 1998 2 rainy 6
Using a dictionary to specify different prefixes for more than than one column.
>>> df >> pivot_longer( ... cols=select(startswith='data'), ... names_to=['take', 'season'], ... values_to='score', ... names_pattern=r'data_(t\d)(_\w+)', ... names_prefix={'take': 't', 'season': '_'} ... ) name city year take season score 0 mary dakar 1990 1 sunny 8 1 mary dakar 1992 1 sunny 6 2 john lome 1996 1 sunny 4 3 john lome 1998 1 sunny 7 4 mary dakar 1990 2 rainy 9 5 mary dakar 1992 2 rainy 7 6 john lome 1996 2 rainy 7 7 john lome 1998 2 rainy 6