plydata.tidy.unite¶

class plydata.tidy.unite(*args, **kwargs)[source]¶

Join multiple columns into one

Parameters

datadataframe, optional: Useful when not using the >> operator.
colstr: Name of new column
*unite_colslist-like | select | str | slice: Columns to join. Uses select.
sepstr: Separator between values. Default is _.
removebool: If True, remove the input columns from the output dataframe.
na_rmbool: If True, missing values will be removed prior to uniting each value.

Examples

>>> import pandas as pd

>>> df = pd.DataFrame({
...     'c1': [1, 2, 3, 4, None],
...     'c2': list('abcde'),
...     'c3': list('vwxyz')
... })
>>> df
    c1 c2 c3
0  1.0  a  v
1  2.0  b  w
2  3.0  c  x
3  4.0  d  y
4  NaN  e  z
>>> df >> unite('c1c2', 'c1', 'c2')
    c1c2  c3
0  1.0_a   v
1  2.0_b   w
2  3.0_c   x
3  4.0_d   y
4  nan_e   z
>>> df >> unite('c1c2', 'c1', 'c2', na_rm=True)
    c1c2  c3
0  1.0_a   v
1  2.0_b   w
2  3.0_c   x
3  4.0_d   y
4      e   z
>>> df >> unite('c2c3', 'c2', 'c3', sep=',')
    c1 c2c3
0  1.0  a,v
1  2.0  b,w
2  3.0  c,x
3  4.0  d,y
4  NaN  e,z
>>> df >> unite('c2c3', 'c2', 'c3', remove=False)
    c1 c2c3 c2 c3
0  1.0  a_v  a  v
1  2.0  b_w  b  w
2  3.0  c_x  c  x
3  4.0  d_y  d  y
4  NaN  e_z  e  z

You can choose columns in all ways that select can understand and you can also pass a select verb directly.

>>> df >> unite('c2c3', '-c1')
    c1 c2c3
1.0  a_v
2.0  b_w
3.0  c_x
4.0  d_y
NaN  e_z

>>> df >> unite('c2c3', select(matches=r'c[23]$'))
    c1 c2c3
1.0  a_v
2.0  b_w
3.0  c_x
4.0  d_y
NaN  e_z