Changelog

v0.5.0

(not-yet-released)

Bug Fixes

  • Fixed bug in arrange where the sorting would not work in some cases when the dataframe index was out of order.

API Changes

query, query_all, query_at, query_if, arrange_all, arrange_at and arrange_if now return dataframe with the indices reset.

v0.4.3

(2020-12-08)

Bug Fixes

  • This release makes Plydata depend on pandas >= 1.1.5.

v0.4.2

(2020-09-12)

  • This is release makes Plydata depend on pandas < 1.1.0. See Issue 23 for details.

v0.4.1

(2020-06-10)

Bug Fixes

  • Fixed bug in define where you could not create a new column from array-like or series-like iterables. (GH21)

  • Fixed bug in arrange where dataframes with irregular indicies would give wrong output. (GH22)

v0.4.0

(2020-03-15)

Bug Fixes

  • query now works within groups.

New Features

  • Added gather to transform dataframe from wide-form to long-form.

  • Added spread to transform dataframe from long-form to wide-form

  • Added separate to split a string variable/ column into different variables/columns.

  • Added extract which uses a regular expression with groups to extract one or more variables different columns.

  • Added pivot_wider to transform dataframe from long-form to wide-form. This is a more general version of spread.

  • Added pivot_longer to transform dataframe from wide-form to long-form. This is a more general version of gather.

  • Added separate_rows to split multiple delimited values and place each one in its own row.

  • Added unite to join multiple columns into one.

  • Added cat_inorder which creates a categorical with categories in order of how they appear in the sequence.

  • Added cat_infreq which creates a categorical with categories in order of the number of times they appear in the sequence.

  • Added cat_inseq which creates a categorical with categories in ascending numerical order.

  • Added cat_reorder which creates a categorical with categories ordered according to another variable.

  • Added cat_reorder2 which creates a categorical with categories ordered according a relationship between two other variables.

  • Added cat_rev which creates a categorical with reversed categories.

  • Added cat_shuffle which creates a categorical with the categories in a random order.

  • Added cat_shift which creates a categorical with the categories shifted to the left or to the right.

  • Added cat_move (cat_relevel) which creates a categorical with the categories moved to a given position.

  • Added cat_anon which creates a categorical with the categories renamed and reordered with arbitrary numeric identifiers.

  • Added cat_collapse which creates a categorical with new umbrella categories that combine one or more of the original categories.

  • Added cat_other which creates a categorical with a new umbrella category that combines one or more of the original categories.

  • Added cat_lump which lumps together most/least common categories.

  • Added cat_lump_min which lumps together common enough categories.

  • Added cat_rename with which you can manually change category names (and values).

  • Added cat_relabel to change category names using a function.

  • Added cat_expand to add or remove categories to a categorical.

  • Added cat_explicit_na to create a category for missing values.

  • Added cat_remove_unsed to remove/drop unused categories.

  • Added cat_unify to unify (union of all) the categories in a list of categoricals.

  • Added cat_concat to concantenate categoricals and combine the categories.

  • Added cat_zip to combine two or more categoricals.

  • Added ply function. Makes it possible to use plydata with implied piping without abusing the >> operator. It is also more efficient as it minimises the copying of data.

  • Added cat_lump_n, cat_lump_prop, and cat_lump_lowfreq as the distinct cases of cat_lump.

Enhancements

  • You cannot modify variables that have been grouped on, an exception is raised.

df = pd.DataFrame({'x': [1, 1, 2], 'y': [1, 2, 3]])})
df >> define(x='2*x')                   # Correct
df >> group_by('x') >> define(x='2*x')  # Error
  • Fixed select can now exclude columns that are prepend with a -

v0.3.3

(2018-08-02)

v0.3.2

(2017-11-27)

New Features

  • You can now use slices to select columns (GH9).

v0.3.1

(2017-11-21)

  • Fixed exception with evaluation of grouped categorical columns when there are missing categories in the data.

  • Fixed issue with ignored groups when count and add_count are used with a grouped dataframe. The groups list in the verb call were ignored.

  • Fixed issue where a dataframe with a column named n, the column could not be referenced (GH6).

v0.3.0

(2017-11-03)

  • Fixed define (mutate) and create (transmute), make them work with group_by.

  • Fixed tally to work with external arrays.

  • Fixed tally to sort in descending order.

  • Fixed the nth function of summarize to return NaN when the requested value is out of bounds.

  • The contains and matches parameters of select can now accept a tuple of values.

  • Fixed verbs that create columns (i.e create, define and do) so that they can create categorical columns.

  • The join verbs gained left_on and right_on parameters.

  • Fixed verb reuse. You can create a verb and assign it to a variable and pipe to the same variable in different operations.

  • Fixed issue where select does maintain the order in which the columns are listed.

New Features

API Changes

  • Using internal function for summarize that counts the number of elements in the current group changed from {n} to n().

  • You can now use piping with the two table verbs (the joins).

  • modify_where and define_where helper verbs have been removed. Using the new expression helper functions case_when and if_else is more readable.

  • Removed dropna and fillna in favour of using call with pandas.DataFrame.dropna() and pandas.DataFrame.fillna().

v0.2.1

(2017-09-20)

  • Fixed issue with do and summarize where the categorical group columns are not categorical in the result.

  • Fixed issue with internal modules being imported with from plydata import *.

  • Added dropna and fillna verbs. They both wrap around pandas methods of the same name. Now you man maintain the pipelining when dealing with most NaN values.

v0.2.0

(2017-05-06)

  • distinct now uses pandas.unique instead of numpy.unique().

  • Added function Q() for quote non-pythonic column names in a dataframe.

  • Fixed query and modify_where query expressions to handle environment variables.

  • Added options context manager.

  • Fixed bug where some verbs were not reusable. e.g.

    data = pd.DataFrame({'x': range(5)})
    v = define(y='x*2')
    df >> v  # first use
    df >> v  # Reuse of v
    
  • Added define_where verb, a combination of define and modify_where.

v0.1.1

(2017-04-11)

Re-release of v0.1.0

v0.1.0

(2017-04-11)

First public release