Changelog¶

v0.4.3¶

(2020-12-08)

Bug Fixes¶

This release makes Plydata depend on pandas >= 1.1.5.

v0.4.2¶

(2020-09-12)

This is release makes Plydata depend on pandas < 1.1.0. See Issue 23 for details.

v0.4.1¶

(2020-06-10)

Bug Fixes¶

Fixed bug in define where you could not create a new column from array-like or series-like iterables. (GH21)
Fixed bug in arrange where dataframes with irregular indicies would give wrong output. (GH22)

v0.4.0¶

(2020-03-15)

Bug Fixes¶

query now works within groups.

New Features¶

Added gather to transform dataframe from wide-form to long-form.
Added spread to transform dataframe from long-form to wide-form
Added separate to split a string variable/ column into different variables/columns.
Added extract which uses a regular expression with groups to extract one or more variables different columns.
Added pivot_wider to transform dataframe from long-form to wide-form. This is a more general version of spread.
Added pivot_longer to transform dataframe from wide-form to long-form. This is a more general version of gather.
Added separate_rows to split multiple delimited values and place each one in its own row.
Added unite to join multiple columns into one.
Added cat_inorder which creates a categorical with categories in order of how they appear in the sequence.
Added cat_infreq which creates a categorical with categories in order of the number of times they appear in the sequence.
Added cat_inseq which creates a categorical with categories in ascending numerical order.
Added cat_reorder which creates a categorical with categories ordered according to another variable.
Added cat_reorder2 which creates a categorical with categories ordered according a relationship between two other variables.
Added cat_rev which creates a categorical with reversed categories.
Added cat_shuffle which creates a categorical with the categories in a random order.
Added cat_shift which creates a categorical with the categories shifted to the left or to the right.
Added cat_move (cat_relevel) which creates a categorical with the categories moved to a given position.
Added cat_anon which creates a categorical with the categories renamed and reordered with arbitrary numeric identifiers.
Added cat_collapse which creates a categorical with new umbrella categories that combine one or more of the original categories.
Added cat_other which creates a categorical with a new umbrella category that combines one or more of the original categories.
Added cat_lump which lumps together most/least common categories.
Added cat_lump_min which lumps together common enough categories.
Added cat_rename with which you can manually change category names (and values).
Added cat_relabel to change category names using a function.
Added cat_expand to add or remove categories to a categorical.
Added cat_explicit_na to create a category for missing values.
Added cat_remove_unsed to remove/drop unused categories.
Added cat_unify to unify (union of all) the categories in a list of categoricals.
Added cat_concat to concantenate categoricals and combine the categories.
Added cat_zip to combine two or more categoricals.
Added ply function. Makes it possible to use plydata with implied piping without abusing the >> operator. It is also more efficient as it minimises the copying of data.
Added cat_lump_n, cat_lump_prop, and cat_lump_lowfreq as the distinct cases of cat_lump.

Enhancements¶

You cannot modify variables that have been grouped on, an exception is raised.

df = pd.DataFrame({'x': [1, 1, 2], 'y': [1, 2, 3]])})
df >> define(x='2*x')                   # Correct
df >> group_by('x') >> define(x='2*x')  # Error

Fixed select can now exclude columns that are prepend with a -

v0.3.3¶

(2018-08-02)

Fixed group_indices for the case with no groups.

v0.3.2¶

(2017-11-27)

New Features¶

You can now use slices to select columns (GH9).

v0.3.1¶

(2017-11-21)

Fixed exception with evaluation of grouped categorical columns when there are missing categories in the data.
Fixed issue with ignored groups when count and add_count are used with a grouped dataframe. The groups list in the verb call were ignored.
Fixed issue where a dataframe with a column named n, the column could not be referenced (GH6).

v0.3.0¶

(2017-11-03)

Fixed define (mutate) and create (transmute), make them work with group_by.
Fixed tally to work with external arrays.
Fixed tally to sort in descending order.
Fixed the nth function of summarize to return NaN when the requested value is out of bounds.
The contains and matches parameters of select can now accept a tuple of values.
Fixed verbs that create columns (i.e create, define and do) so that they can create categorical columns.
The join verbs gained left_on and right_on parameters.
Fixed verb reuse. You can create a verb and assign it to a variable and pipe to the same variable in different operations.
Fixed issue where select does maintain the order in which the columns are listed.

New Features¶

Added special verb call, it allows one to use external functions that accept a dataframe as the first argument.
For define, create and group_by, you can now use the special function n() to count the number of elements in current group.
Added the single table helper verbs:
Added pull verb.
Added slice_rows verb.

API Changes¶

Using internal function for summarize that counts the number of elements in the current group changed from {n} to n().
You can now use piping with the two table verbs (the joins).
modify_where and define_where helper verbs have been removed. Using the new expression helper functions case_when and if_else is more readable.
Removed dropna and fillna in favour of using call with pandas.DataFrame.dropna() and pandas.DataFrame.fillna().

v0.2.1¶

(2017-09-20)

Fixed issue with do and summarize where the categorical group columns are not categorical in the result.
Fixed issue with internal modules being imported with from plydata import *.
Added dropna and fillna verbs. They both wrap around pandas methods of the same name. Now you man maintain the pipelining when dealing with most NaN values.

v0.2.0¶

(2017-05-06)

distinct now uses pandas.unique instead of numpy.unique().
Added function Q() for quote non-pythonic column names in a dataframe.
Fixed query and modify_where query expressions to handle environment variables.
Added options context manager.

Fixed bug where some verbs were not reusable. e.g.

data = pd.DataFrame({'x': range(5)})
v = define(y='x*2')
df >> v  # first use
df >> v  # Reuse of v

Added define_where verb, a combination of define and modify_where.

v0.1.1¶

(2017-04-11)

Re-release of v0.1.0

v0.1.0¶

(2017-04-11)

First public release