API Reference¶
One table verbs¶
Sort rows by column variables |
|
Create DataFrame with columns |
|
Add column to DataFrame |
|
Select distinct/unique rows |
|
Do arbitrary operations on a dataframe |
|
Group dataframe by one or more columns/variables |
|
Generate a unique id for each group |
|
Select the top n rows |
|
alias of |
|
Pull a single column from the dataframe |
|
Return rows with matching conditions |
|
Rename columns |
|
Sample a fraction of rows from dataframe |
|
Sample n rows from dataframe |
|
Select columns by name |
|
Select rows |
|
Summarise multiple values to a single value |
|
Select the bottom n rows |
|
alias of |
|
Remove the grouping variables for dataframe |
|
alias of |
Helpers¶
Add column with number of items in each group |
|
Add column with tally of items in each group |
|
Count observations by group |
|
Tally observations by group |
|
Call external function or dataframe method |
|
Arrange by all columns |
|
Arrange by specific columns |
|
Arrange by all column that match a predicate |
|
Create a new dataframe with all columns |
|
Create dataframe with specific columns |
|
Create a new dataframe with columns selected by a predicate |
|
Groupby all columns |
|
Group by select columns |
|
Group by selected columns that are true for a predicate |
|
Modify all columns that are true for a predicate |
|
Change selected columns |
|
Modify selected columns that are true for a predicate |
|
Query all columns |
|
Query specific columns |
|
Query all columns that match a predicate |
|
Rename all columns |
|
Rename specific columns |
|
Rename all columns that match a predicate |
|
Select all columns |
|
Select specific columns |
|
Select all columns that match a predicate |
|
alias of |
|
alias of |
|
alias of |
|
Summarise all non-grouping columns |
|
Summarize select columns |
|
Summarise all columns that are true for a predicate |
|
alias of |
|
alias of |
|
alias of |
Two table verbs¶
Join and keep rows only found in left frame |
|
alias of |
|
Join dataframes using the intersection of keys from both frames |
|
Join dataframes using only keys from left frame |
|
Join dataframes using the union of keys from both frames |
|
Join dataframes using only keys from right frame |
|
Join and keep columns only found in left frame & no duplicate rows |
Expression helpers¶
These classes can be used construct complicated conditional assignment expressions.
Vectorized case |
|
Vectorized if |
Options¶
For actions where it may be more efficient, if |
|
Get plydata option |
|
Set plydata option |
|
Options context manager |
Useful Functions¶
Quote a variable name |
|
Size of a group |
|
Find first value of y when sorted by x |
|
Find last value of y when sorted by x |
|
Pipe data through the verbs |
Tidy Verbs¶
These verbs help create tidy data.
You can import them with from plydata.tidy import *
.
Pivoting¶
Pivoting changes the representation of a rectangular dataset, without changing the data inside it.
Collapse multiple columns into key-value pairs. |
|
Lengthen dataframe by reducing the columns & turning them into into values |
|
Spread a key-value pair across multiple columns |
|
Spread a key-value pair across multiple columns |
String Columns¶
These verbs help separate multiple variables that are joined together in a single column.
Split a column using a regular expression with capturing groups. |
|
Split a single column into multiple columns |
|
Separate values of a variable along multiple rows |
|
Join multiple columns into one |
Categorical Tools¶
Functions to solve common problems when working with categorical variables.
You can import them with from plydata.cat_tools import *
.
Change the order of categories¶
These functions keep the values the same but change the order of the categories.
Reorder categorical by frequency of the values |
|
Reorder categorical by appearance |
|
Reorder categorical by numerical order |
|
Reorder categories explicitly |
|
Reorder categorical by sorting along another variable |
|
Reorder categorical by sorting along another variable |
|
Reverse order of categories |
|
Shift and wrap-around categories to the left or right |
|
Reverse order of categories |
Change the value of categories¶
These functions change the categories while preserving the order (as much as possible).
Anonymise categories |
|
Collapse categories into manually defined groups |
|
Lump together least or most common categories |
|
Lump together least categories |
|
Lump catogeries, preserving those that appear min number of times |
|
Lump together most/least common n categories |
|
Lump together least or most common categories by proportion |
|
Replace categories with 'other' |
|
Change/rename categories manually |
|
Change/rename categories and collapse as necessary |
|
Change/rename categories manually |
Add or Remove Categories¶
These functions leave the data values as is, but they add or remove categories.
Remove unused categories |
|
Add additional categories to a categorical |
|
Give missing values an explicity category |
|
Remove unused categories |
|
Unify (union of all) the categories in a list of categoricals |
Combine Multiple Categoricals¶
Concatenate categoricals and combine the categories |
|
Create a new categorical (zip style) combined from two or more |
Datasets¶
These datasets ship with plydata and you can import them with from the
plydata.data
sub-package.