plydata.cat_tools.cat_lump_prop¶
-
plydata.cat_tools.
cat_lump_prop
(c, prop, w=None, other_category='other')[source]¶ Lump together least or most common categories by proportion
- Parameters
- clist-like
Values that will make up the categorical.
- prop
float
Proportion above/below which the values of a category will be preserved (not lumped together). Positive
prop
preserves categories whose proportion of values is more thanprop
. Negativeprop
preserves categories whose proportion of values is less thanprop
.Lumping happens on condition that the lumped category "other" will have the smallest number of items. You should only specify one of
n
orprop
- w
list
[int|float] (optional) Weights for the frequency of each value. It should be the same length as
c
.- other_category
object
(default: 'other') Value used for the 'other' values. It is placed at the end of the categories.
Examples
By proportions, categories that make up more than
prop
fraction of the items.>>> c = pd.Categorical(list('abccdd')) >>> cat_lump_prop(c, 1/3.01) ['other', 'other', 'c', 'c', 'd', 'd'] Categories (3, object): ['c', 'd', 'other'] >>> cat_lump_prop(c, -1/3.01) ['a', 'b', 'other', 'other', 'other', 'other'] Categories (3, object): ['a', 'b', 'other'] >>> cat_lump_prop(c, 1/2) ['other', 'other', 'other', 'other', 'other', 'other'] Categories (1, object): ['other']