plydata.cat_tools.cat_lump_lowfreq

plydata.cat_tools.cat_lump_lowfreq(c, other_category='other')[source]

Lump together least categories

Ensures that the "other" category is still the smallest.

Parameters
clist-like

Values that will make up the categorical.

other_categoryobject (default: 'other')

Value used for the 'other' values. It is placed at the end of the categories.

Examples

>>> cat_lump_lowfreq(list('abbccc'))
['other', 'b', 'b', 'c', 'c', 'c']
Categories (3, object): ['b', 'c', 'other']

When the least categories put together are not less than the next smallest group.

>>> cat_lump_lowfreq(list('abcddd'))
['a', 'b', 'c', 'd', 'd', 'd']
Categories (4, object): ['a', 'b', 'c', 'd']
>>> cat_lump_lowfreq(list('abcdddd'))
['other', 'other', 'other', 'd', 'd', 'd', 'd']
Categories (2, object): ['d', 'other']