API Reference¶

One table verbs¶

`arrange`	Sort rows by column variables
`create`	Create DataFrame with columns
`define`	Add column to DataFrame
`distinct`	Select distinct/unique rows
`do`	Do arbitrary operations on a dataframe
`group_by`	Group dataframe by one or more columns/variables
`group_indices`	Generate a unique id for each group
`head`	Select the top n rows
`mutate`	alias of `plydata.one_table_verbs.define`
`pull`	Pull a single column from the dataframe
`query`	Return rows with matching conditions
`rename`	Rename columns
`sample_frac`	Sample a fraction of rows from dataframe
`sample_n`	Sample n rows from dataframe
`select`	Select columns by name
`slice_rows`	Select rows
`summarize`	Summarise multiple values to a single value
`tail`	Select the bottom n rows
`transmute`	alias of `plydata.one_table_verbs.create`
`ungroup`	Remove the grouping variables for dataframe
`unique`	alias of `plydata.one_table_verbs.distinct`

Helpers¶

`add_count`	Add column with number of items in each group
`add_tally`	Add column with tally of items in each group
`count`	Count observations by group
`tally`	Tally observations by group
`call`	Call external function or dataframe method
`arrange_all`	Arrange by all columns
`arrange_at`	Arrange by specific columns
`arrange_if`	Arrange by all column that match a predicate
`create_all`	Create a new dataframe with all columns
`create_at`	Create dataframe with specific columns
`create_if`	Create a new dataframe with columns selected by a predicate
`group_by_all`	Groupby all columns
`group_by_at`	Group by select columns
`group_by_if`	Group by selected columns that are true for a predicate
`mutate_all`	Modify all columns that are true for a predicate
`mutate_at`	Change selected columns
`mutate_if`	Modify selected columns that are true for a predicate
`query_all`	Query all columns
`query_at`	Query specific columns
`query_if`	Query all columns that match a predicate
`rename_all`	Rename all columns
`rename_at`	Rename specific columns
`rename_if`	Rename all columns that match a predicate
`select_all`	Select all columns
`select_at`	Select specific columns
`select_if`	Select all columns that match a predicate
`summarise_all`	alias of `plydata.helper_verbs.summarize_all`
`summarise_at`	alias of `plydata.helper_verbs.summarize_at`
`summarise_if`	alias of `plydata.helper_verbs.summarize_if`
`summarize_all`	Summarise all non-grouping columns
`summarize_at`	Summarize select columns
`summarize_if`	Summarise all columns that are true for a predicate
`transmute_all`	alias of `plydata.helper_verbs.create_all`
`transmute_at`	alias of `plydata.helper_verbs.create_at`
`transmute_if`	alias of `plydata.helper_verbs.create_if`

Two table verbs¶

`anti_join`	Join and keep rows only found in left frame
`full_join`	alias of `plydata.two_table_verbs.outer_join`
`inner_join`	Join dataframes using the intersection of keys from both frames
`left_join`	Join dataframes using only keys from left frame
`outer_join`	Join dataframes using the union of keys from both frames
`right_join`	Join dataframes using only keys from right frame
`semi_join`	Join and keep columns only found in left frame & no duplicate rows

Expression helpers¶

These classes can be used construct complicated conditional assignment expressions.

`case_when`	Vectorized case
`if_else`	Vectorized if

Options¶

`modify_input_data`	For actions where it may be more efficient, if `True` the verb modifies the input data.
`get_option`	Get plydata option
`set_option`	Set plydata option
`options`	Options context manager

Useful Functions¶

`Q`	Quote a variable name
`n`	Size of a group
`first2`	Find first value of y when sorted by x
`last2`	Find last value of y when sorted by x
`ply`	Pipe data through the verbs

Tidy Verbs¶

These verbs help create tidy data. You can import them with from plydata.tidy import *.

Pivoting¶

Pivoting changes the representation of a rectangular dataset, without changing the data inside it.

`gather`	Collapse multiple columns into key-value pairs.
`pivot_longer`	Lengthen dataframe by reducing the columns & turning them into into values
`pivot_wider`	Spread a key-value pair across multiple columns
`spread`	Spread a key-value pair across multiple columns

String Columns¶

These verbs help separate multiple variables that are joined together in a single column.

`extract`	Split a column using a regular expression with capturing groups.
`separate`	Split a single column into multiple columns
`separate_rows`	Separate values of a variable along multiple rows
`unite`	Join multiple columns into one

Categorical Tools¶

Functions to solve common problems when working with categorical variables. You can import them with from plydata.cat_tools import *.

Change the order of categories¶

These functions keep the values the same but change the order of the categories.

`cat_infreq`	Reorder categorical by frequency of the values
`cat_inorder`	Reorder categorical by appearance
`cat_inseq`	Reorder categorical by numerical order
`cat_relevel`	Reorder categories explicitly
`cat_reorder`	Reorder categorical by sorting along another variable
`cat_reorder2`	Reorder categorical by sorting along another variable
`cat_rev`	Reverse order of categories
`cat_shift`	Shift and wrap-around categories to the left or right
`cat_shuffle`	Reverse order of categories

Change the value of categories¶

These functions change the categories while preserving the order (as much as possible).

`cat_anon`	Anonymise categories
`cat_collapse`	Collapse categories into manually defined groups
`cat_lump`	Lump together least or most common categories
`cat_lump_lowfreq`	Lump together least categories
`cat_lump_min`	Lump catogeries, preserving those that appear min number of times
`cat_lump_n`	Lump together most/least common n categories
`cat_lump_prop`	Lump together least or most common categories by proportion
`cat_other`	Replace categories with 'other'
`cat_recode`	Change/rename categories manually
`cat_relabel`	Change/rename categories and collapse as necessary
`cat_rename`	Change/rename categories manually

Add or Remove Categories¶

These functions leave the data values as is, but they add or remove categories.

`cat_drop`	Remove unused categories
`cat_expand`	Add additional categories to a categorical
`cat_explicit_na`	Give missing values an explicity category
`cat_remove_unused`	Remove unused categories
`cat_unify`	Unify (union of all) the categories in a list of categoricals

Combine Multiple Categoricals¶

`cat_concat`	Concatenate categoricals and combine the categories
`cat_zip`	Create a new categorical (zip style) combined from two or more

Datasets¶

These datasets ship with plydata and you can import them with from the plydata.data sub-package.