plydata.expressions.case_when¶

class plydata.expressions.case_when[source]¶

Vectorized case

Parameters

argsmapping, iterable: (predicate, value) pairs, ordered from most specific to most general.
kwargscollections.OrderedDict: {predicate: value} pairs, ordered from most specific to most general.

Notes

As dict classes are ordered, in python 3.6 and above you can get away with:

df >> define(divisible=case_when({
    'x%2 == 0': 'x+200',
    'x%3 == 0': 'x+300',
    True: -1
}))

However, be careful it may not always be the case.

Examples

>>> import pandas as pd
>>> from plydata import define
>>> from plydata.expressions import case_when
>>> df = pd.DataFrame({'x': range(10)})

Here we use an iterable of tuples with key-value pairs for the predicate and value.

>>> df >> define(divisible=case_when([
...     ('x%2 == 0', 2),
...     ('x%3 == 0', 3),
...     (True, -1)
... ]))
   x  divisible
0  0          2
1  1         -1
2  2          2
3  3          3
4  4          2
5  5         -1
6  6          2
7  7         -1
8  8          2
9  9          3

When the most general predicate comes first, it obscures the rest. Every row is matched by atmost one predicate function

>>> df >> define(divisible=case_when([
...     (True, -1),
...     ('x%2 == 0', 2),
...     ('x%3 == 0', 3)
... ]))
   x  divisible
0  0         -1
1  1         -1
2  2         -1
3  3         -1
4  4         -1
5  5         -1
6  6         -1
7  7         -1
8  8         -1
9  9         -1

String values must be quoted

>>> df >> define(divisible=case_when([
...     ('x%2 == 0', '"by-2"'),
...     ('x%3 == 0', '"by-3"'),
...     (True, '"neither-by-2or3"')
... ]))
   x        divisible
0  0             by-2
1  1  neither-by-2or3
2  2             by-2
3  3             by-3
4  4             by-2
5  5  neither-by-2or3
6  6             by-2
7  7  neither-by-2or3
8  8             by-2
9  9             by-3

The values can be expressions

>>> df >> define(divisible=case_when([
...     ('x%2 == 0', 'x+200'),
...     ('x%3 == 0', 'x+300'),
...     (True, -1)
... ]))
   x  divisible
0  0        200
1  1         -1
2  2        202
3  3        303
4  4        204
5  5         -1
6  6        206
7  7         -1
8  8        208
9  9        309

Combining Predicates

When combining predicate statements, you can use the bitwise operators, |, &, ^ and ~. The different statements must be enclosed in parenthesis, -- ().

>>> df >> define(y=case_when([
...     ('(x < 5) & (x % 2 == 0)', '"less-than-5-and-even"'),
...     ('(x < 5) & (x % 2 != 0)', '"less-than-5-and-odd"'),
...     ('(x > 5) & (x % 2 == 0)', '"greater-than-5-and-even"'),
...     ('(x > 5) & (x % 2 != 0)', '"greater-than-5-and-odd"'),
...     (True, '"Just 5"')
... ]))
   x                        y
0  0     less-than-5-and-even
1  1      less-than-5-and-odd
2  2     less-than-5-and-even
3  3      less-than-5-and-odd
4  4     less-than-5-and-even
5  5                   Just 5
6  6  greater-than-5-and-even
7  7   greater-than-5-and-odd
8  8  greater-than-5-and-even
9  9   greater-than-5-and-odd