API Reference

One table verbs

arrange

Sort rows by column variables

create

Create DataFrame with columns

define

Add column to DataFrame

distinct

Select distinct/unique rows

do

Do arbitrary operations on a dataframe

group_by

Group dataframe by one or more columns/variables

group_indices

Generate a unique id for each group

head

Select the top n rows

mutate

alias of plydata.one_table_verbs.define

pull

Pull a single column from the dataframe

query

Return rows with matching conditions

rename

Rename columns

sample_frac

Sample a fraction of rows from dataframe

sample_n

Sample n rows from dataframe

select

Select columns by name

slice_rows

Select rows

summarize

Summarise multiple values to a single value

tail

Select the bottom n rows

transmute

alias of plydata.one_table_verbs.create

ungroup

Remove the grouping variables for dataframe

unique

alias of plydata.one_table_verbs.distinct

Helpers

add_count

Add column with number of items in each group

add_tally

Add column with tally of items in each group

count

Count observations by group

tally

Tally observations by group

call

Call external function or dataframe method

arrange_all

Arrange by all columns

arrange_at

Arrange by specific columns

arrange_if

Arrange by all column that match a predicate

create_all

Create a new dataframe with all columns

create_at

Create dataframe with specific columns

create_if

Create a new dataframe with columns selected by a predicate

group_by_all

Groupby all columns

group_by_at

Group by select columns

group_by_if

Group by selected columns that are true for a predicate

mutate_all

Modify all columns that are true for a predicate

mutate_at

Change selected columns

mutate_if

Modify selected columns that are true for a predicate

query_all

Query all columns

query_at

Query specific columns

query_if

Query all columns that match a predicate

rename_all

Rename all columns

rename_at

Rename specific columns

rename_if

Rename all columns that match a predicate

select_all

Select all columns

select_at

Select specific columns

select_if

Select all columns that match a predicate

summarise_all

alias of plydata.helper_verbs.summarize_all

summarise_at

alias of plydata.helper_verbs.summarize_at

summarise_if

alias of plydata.helper_verbs.summarize_if

summarize_all

Summarise all non-grouping columns

summarize_at

Summarize select columns

summarize_if

Summarise all columns that are true for a predicate

transmute_all

alias of plydata.helper_verbs.create_all

transmute_at

alias of plydata.helper_verbs.create_at

transmute_if

alias of plydata.helper_verbs.create_if

Two table verbs

anti_join

Join and keep rows only found in left frame

full_join

alias of plydata.two_table_verbs.outer_join

inner_join

Join dataframes using the intersection of keys from both frames

left_join

Join dataframes using only keys from left frame

outer_join

Join dataframes using the union of keys from both frames

right_join

Join dataframes using only keys from right frame

semi_join

Join and keep columns only found in left frame & no duplicate rows

Expression helpers

These classes can be used construct complicated conditional assignment expressions.

case_when

Vectorized case

if_else

Vectorized if

Options

modify_input_data

For actions where it may be more efficient, if True the verb modifies the input data.

get_option

Get plydata option

set_option

Set plydata option

options

Options context manager

Useful Functions

Q

Quote a variable name

n

Size of a group

first2

Find first value of y when sorted by x

last2

Find last value of y when sorted by x

ply

Pipe data through the verbs

Tidy Verbs

These verbs help create tidy data. You can import them with from plydata.tidy import *.

Pivoting

Pivoting changes the representation of a rectangular dataset, without changing the data inside it.

gather

Collapse multiple columns into key-value pairs.

pivot_longer

Lengthen dataframe by reducing the columns & turning them into into values

pivot_wider

Spread a key-value pair across multiple columns

spread

Spread a key-value pair across multiple columns

String Columns

These verbs help separate multiple variables that are joined together in a single column.

extract

Split a column using a regular expression with capturing groups.

separate

Split a single column into multiple columns

separate_rows

Separate values of a variable along multiple rows

unite

Join multiple columns into one

Categorical Tools

Functions to solve common problems when working with categorical variables. You can import them with from plydata.cat_tools import *.

Change the order of categories

These functions keep the values the same but change the order of the categories.

cat_infreq

Reorder categorical by frequency of the values

cat_inorder

Reorder categorical by appearance

cat_inseq

Reorder categorical by numerical order

cat_relevel

Reorder categories explicitly

cat_reorder

Reorder categorical by sorting along another variable

cat_reorder2

Reorder categorical by sorting along another variable

cat_rev

Reverse order of categories

cat_shift

Shift and wrap-around categories to the left or right

cat_shuffle

Reverse order of categories

Change the value of categories

These functions change the categories while preserving the order (as much as possible).

cat_anon

Anonymise categories

cat_collapse

Collapse categories into manually defined groups

cat_lump

Lump together least or most common categories

cat_lump_lowfreq

Lump together least categories

cat_lump_min

Lump catogeries, preserving those that appear min number of times

cat_lump_n

Lump together most/least common n categories

cat_lump_prop

Lump together least or most common categories by proportion

cat_other

Replace categories with 'other'

cat_recode

Change/rename categories manually

cat_relabel

Change/rename categories and collapse as necessary

cat_rename

Change/rename categories manually

Add or Remove Categories

These functions leave the data values as is, but they add or remove categories.

cat_drop

Remove unused categories

cat_expand

Add additional categories to a categorical

cat_explicit_na

Give missing values an explicity category

cat_remove_unused

Remove unused categories

cat_unify

Unify (union of all) the categories in a list of categoricals

Combine Multiple Categoricals

cat_concat

Concatenate categoricals and combine the categories

cat_zip

Create a new categorical (zip style) combined from two or more

Datasets

These datasets ship with plydata and you can import them with from the plydata.data sub-package.