plydata.cat_tools.cat_lump_min¶
-
plydata.cat_tools.
cat_lump_min
(c, min, w=None, other_category='other')[source]¶ Lump catogeries, preserving those that appear min number of times
- Parameters
- clist-like
Values that will make up the categorical.
- min
int
Minum number of times a category must be represented to be preserved.
- w
list
[int|float] (optional) Weights for the frequency of each value. It should be the same length as
c
.- other_category
object
(default: 'other') Value used for the 'other' values. It is placed at the end of the categories.
Examples
>>> c = list('abccdd') >>> cat_lump_min(c, min=1) ['a', 'b', 'c', 'c', 'd', 'd'] Categories (4, object): ['a', 'b', 'c', 'd'] >>> cat_lump_min(c, min=2) ['other', 'other', 'c', 'c', 'd', 'd'] Categories (3, object): ['c', 'd', 'other']
Weighted Lumping
>>> weights = [2, 2, .5, .5, 1, 1] >>> cat_lump_min(c, min=2, w=weights) ['a', 'b', 'other', 'other', 'd', 'd'] Categories (4, object): ['a', 'b', 'd', 'other']
Unlike
cat_lump()
,cat_lump_min()
can lump together and create a category larger than the preserved categories.>>> c = list('abxyzccdd') >>> cat_lump_min(c, min=2) ['other', 'other', 'other', 'other', 'other', 'c', 'c', 'd', 'd'] Categories (3, object): ['c', 'd', 'other']