Usage

For data manipulation, there are two types of verbs;

  1. One table verbs

  2. Two table verbs

We define the usage by how the verbs accept the data argument. There are three types of usage;

  1. Piping - The data is to the left side of the pipe symbol (>>).

  2. Composing - The data is the first argument of the verb.

  3. Currying - The data is the only argument of an instantiated verb.

The single table verbs support all three types, while two table verbs only support composing. Here is an example of the same operation in three different ways.

df = pd.DataFrame({'x': [0, 1, 2, 3, 4, 5]})

# Piping
df >> define(w='x%2', y='x+1', z='x+2.5') >> arrange('w')

# Composing
arrange(define(df, w='x%2', y='x+1', z='x+2.5'), 'w')

# Currying
arrange('w')(define(w='x%2', y='x+1', z='x+2.5')(df))

Although composing is the normal way calls are invoked, since data manipulation often involves consecutive function calls, the nested invocations become hard to read. Piping helps improve readability. Currying exists only to spite the zen of python.

Data mutability

By default, plydata does not modify in input dataframe. This means that the verbs have no side effects. The advantage that the user never worries about creating copies to avoid contaminating the input dataframe. It is normal in most data-analysis workflows for the user to manipulate the data many times in different ways. It is also the case that most datasets are small enough that they can be copied many times with no noticeable effect on performance.

If you have a dataset small enough to fit in memory but too large to copy all the time without affecting performance, then consider using the modify_input_data option. However, only a few verbs can modify the input data and it is noted in the documentation.

Datastore Support

Single Table Verbs

Verb

Dataframe

Sqlite

arrange

Yes

No

create

Yes

No

define

Yes

No

distinct

Yes

No

do

Yes

No

group_by

Yes

No

group_indices

Yes

No

head

Yes

No

pull

Yes

No

query

Yes

No

rename

Yes

No

sample_frac

Yes

No

sample_n

Yes

No

select

Yes

No

slice_rows

Yes

No

summarize

Yes

No

tail

Yes

No

ungroup

Yes

No

Helper verbs

Verb

Dataframe

Sqlite

count

Yes

No

tally

Yes

No

add_count

Yes

No

add_tally

Yes

No

call

Yes

No

arrange_all, arrange_at, arrange_if

Yes

No

create_all, create_at, create_if

Yes

No

group_by_all, group_by_at, group_by_if

Yes

No

mutate_all, mutate_at, mutate_if

Yes

No

query_all, query_at, query_if

Yes

No

rename_all, rename_at, rename_if

Yes

No

summarize_all, summarize_at, summarize_if

Yes

No

Two table verbs

Verb

Dataframe

Sqlite

anti_join

Yes

No

inner_join

Yes

No

left_join

Yes

No

outer_join

Yes

No

right_join

Yes

No

semi_join

Yes

No