plydata.one_table_verbs.distinct

class plydata.one_table_verbs.distinct(*args, **kwargs)[source]

Select distinct/unique rows

Parameters
datadataframe, optional

Useful when not using the >> operator.

columnslist-like, optional

Column names to use when determining uniqueness.

keep{'first', 'last', False}, optional
  • first : Keep the first occurence.

  • last : Keep the last occurence.

  • False : Do not keep any of the duplicates.

Default is False.

kwargsdict, optional

{name: expression} computed columns. If specified, these are taken together with the columns when determining unique rows.

Examples

>>> import pandas as pd
>>> df = pd.DataFrame({'x': [1, 1, 2, 3, 4, 4, 5],
...                    'y': [1, 2, 3, 4, 5, 5, 6]})
>>> df >> distinct()
   x  y
0  1  1
1  1  2
2  2  3
3  3  4
4  4  5
6  5  6
>>> df >> distinct(['x'])
   x  y
0  1  1
2  2  3
3  3  4
4  4  5
6  5  6
>>> df >> distinct(['x'], 'last')
   x  y
1  1  2
2  2  3
3  3  4
5  4  5
6  5  6
>>> df >> distinct(z='x%2')
   x  y  z
0  1  1  1
2  2  3  0
>>> df >> distinct(['x'], z='x%2')
   x  y  z
0  1  1  1
2  2  3  0
3  3  4  1
4  4  5  0
6  5  6  1
>>> df >> define(z='x%2') >> distinct(['x', 'z'])
   x  y  z
0  1  1  1
2  2  3  0
3  3  4  1
4  4  5  0
6  5  6  1