plydata.helper_verbs.query_all¶
-
class
plydata.helper_verbs.
query_all
(*args, **kwargs)[source]¶ Query all columns
- Parameters
- data
dataframe
, optional Useful when not using the
>>
operator.- all_vars
str
, optional A predicate statement to evaluate. It should conform to python syntax and should return an array of boolean values (one for every item in the column) or a single boolean (for the whole column). You should use
{_}
to refer to the column names.After the statement is evaluated for all columns, the union (
|
), is used to select the output rows.- any_vars
str
, optional A predicate statement to evaluate. It should conform to python syntax and should return an array of boolean values (one for every item in the column) or a single boolean (for the whole column). You should use
{_}
to refer to the column names.After the statement is evaluated for all columns, the intersection (
&
), is used to select the output rows.- reset_indexbool, optional (default:
True
) If
True
, the index is reset to a sequential range index. IfFalse
, the original index is maintained.
- data
Examples
>>> import pandas as pd >>> import numpy as np >>> from plydata import * >>> df = pd.DataFrame({ ... 'alpha': list('aaabbb'), ... 'beta': list('babruq'), ... 'theta': list('cdecde'), ... 'x': [1, 2, 3, 4, 5, 6], ... 'y': [6, 5, 4, 3, 2, 1], ... 'z': [7, 9, 11, 8, 10, 12] ... })
Select all rows where any of the entries along the columns is a 4.
>>> df >> query_all(any_vars='({_} == 4)') alpha beta theta x y z 0 a b e 3 4 11 1 b r c 4 3 8
The opposit, select all rows where none of the entries along the columns is a 4.
>>> df >> query_all(all_vars='({_} != 4)') alpha beta theta x y z 0 a b c 1 6 7 1 a a d 2 5 9 2 b u d 5 2 10 3 b q e 6 1 12
For something more complicated, group-wise selection.
Select groups where any of the columns a large (> 28) sum. First by using
summarize_all
, we see that there is one such group. Then usingquery_all
selects it.>>> (df ... >> group_by('alpha') ... >> select('x', 'y', 'z') ... >> summarize_all('sum')) alpha x y z 0 a 6 15 27 1 b 15 6 30 >>> (df ... >> group_by('alpha') ... >> select('x', 'y', 'z') ... >> query_all(any_vars='(sum({_}) > 28)')) groups: ['alpha'] alpha x y z 0 b 4 3 8 1 b 5 2 10 2 b 6 1 12
Note that
sum({_}) > 28
is a column operation, it returns a single number for the whole column. Therefore the whole column is either selected or not selected. Column operations are what enable group-wise selection.