Changelog¶
v0.5.0¶
(not-yet-released)
Bug Fixes¶
Fixed bug in
arrangewhere the sorting would not work in some cases when the dataframe index was out of order.
API Changes¶
v0.4.3¶
(2020-12-08)
Bug Fixes¶
This release makes Plydata depend on pandas >= 1.1.5.
v0.4.2¶
(2020-09-12)
This is release makes Plydata depend on pandas < 1.1.0. See GH23 for details.
v0.4.1¶
(2020-06-10)
Bug Fixes¶
v0.4.0¶
(2020-03-15)
Bug Fixes¶
querynow works within groups.
New Features¶
Added
gatherto transform dataframe from wide-form to long-form.Added
spreadto transform dataframe from long-form to wide-formAdded
separateto split a string variable/ column into different variables/columns.Added
extractwhich uses a regular expression with groups to extract one or more variables different columns.Added
pivot_widerto transform dataframe from long-form to wide-form. This is a more general version ofspread.Added
pivot_longerto transform dataframe from wide-form to long-form. This is a more general version ofgather.Added
separate_rowsto split multiple delimited values and place each one in its own row.Added
uniteto join multiple columns into one.Added
cat_inorderwhich creates a categorical with categories in order of how they appear in the sequence.Added
cat_infreqwhich creates a categorical with categories in order of the number of times they appear in the sequence.Added
cat_inseqwhich creates a categorical with categories in ascending numerical order.Added
cat_reorderwhich creates a categorical with categories ordered according to another variable.Added
cat_reorder2which creates a categorical with categories ordered according a relationship between two other variables.Added
cat_revwhich creates a categorical with reversed categories.Added
cat_shufflewhich creates a categorical with the categories in a random order.Added
cat_shiftwhich creates a categorical with the categories shifted to the left or to the right.Added
cat_move(cat_relevel) which creates a categorical with the categories moved to a given position.Added
cat_anonwhich creates a categorical with the categories renamed and reordered with arbitrary numeric identifiers.Added
cat_collapsewhich creates a categorical with new umbrella categories that combine one or more of the original categories.Added
cat_otherwhich creates a categorical with a new umbrella category that combines one or more of the original categories.Added
cat_lumpwhich lumps together most/least common categories.Added
cat_lump_minwhich lumps together common enough categories.Added
cat_renamewith which you can manually change category names (and values).Added
cat_relabelto change category names using a function.Added
cat_expandto add or remove categories to a categorical.Added
cat_explicit_nato create a category for missing values.Added
cat_remove_unsedto remove/drop unused categories.Added
cat_unifyto unify (union of all) the categories in a list of categoricals.Added
cat_concatto concantenate categoricals and combine the categories.Added
cat_zipto combine two or more categoricals.Added
plyfunction. Makes it possible to use plydata with implied piping without abusing the>>operator. It is also more efficient as it minimises the copying of data.Added
cat_lump_n,cat_lump_prop, andcat_lump_lowfreqas the distinct cases ofcat_lump.
Enhancements¶
You cannot modify variables that have been grouped on, an exception is raised.
df = pd.DataFrame({'x': [1, 1, 2], 'y': [1, 2, 3]])})
df >> define(x='2*x') # Correct
df >> group_by('x') >> define(x='2*x') # Error
Fixed
selectcan now exclude columns that are prepend with a-
v0.3.3¶
(2018-08-02)
Fixed
group_indicesfor the case with no groups.
v0.3.2¶
(2017-11-27)
New Features¶
v0.3.1¶
(2017-11-21)
Fixed exception with evaluation of grouped categorical columns when there are missing categories in the data.
Fixed issue with ignored groups when
countandadd_countare used with a grouped dataframe. The groups list in the verb call were ignored.Fixed issue where a dataframe with a column named n, the column could not be referenced (GH6).
v0.3.0¶
(2017-11-03)
Fixed
define(mutate) andcreate(transmute), make them work withgroup_by.Fixed
tallyto work with external arrays.Fixed
tallyto sort in descending order.Fixed the
nthfunction ofsummarizeto return NaN when the requested value is out of bounds.The
containsandmatchesparameters ofselectcan now accept atupleof values.Fixed verbs that create columns (i.e
create,defineanddo) so that they can create categorical columns.The
joinverbs gained left_on and right_on parameters.Fixed verb reuse. You can create a verb and assign it to a variable and pipe to the same variable in different operations.
Fixed issue where
selectdoes maintain the order in which the columns are listed.
New Features¶
Added special verb
call, it allows one to use external functions that accept a dataframe as the first argument.For
define,createandgroup_by, you can now use the special functionn()to count the number of elements in current group.Added the single table helper verbs:
Added
pullverb.Added
slice_rowsverb.
API Changes¶
Using internal function for
summarizethat counts the number of elements in the current group changed from{n}ton().You can now use piping with the two table verbs (the joins).
modify_whereanddefine_wherehelper verbs have been removed. Using the new expression helper functionscase_whenandif_elseis more readable.Removed
dropnaandfillnain favour of usingcallwithpandas.DataFrame.dropna()andpandas.DataFrame.fillna().
v0.2.1¶
(2017-09-20)
Fixed issue with
doandsummarizewhere the categorical group columns are not categorical in the result.Fixed issue with internal modules being imported with
from plydata import *.Added
dropnaandfillnaverbs. They both wrap around pandas methods of the same name. Now you man maintain the pipelining when dealing with mostNaNvalues.
v0.2.0¶
(2017-05-06)
distinctnow uses pandas.unique instead ofnumpy.unique().Added function
Q()for quote non-pythonic column names in a dataframe.Fixed
queryandmodify_wherequery expressions to handle environment variables.Added
optionscontext manager.Fixed bug where some verbs were not reusable. e.g.
data = pd.DataFrame({'x': range(5)}) v = define(y='x*2') df >> v # first use df >> v # Reuse of v
Added
define_whereverb, a combination ofdefineandmodify_where.
v0.1.1¶
(2017-04-11)
Re-release of v0.1.0
v0.1.0¶
(2017-04-11)
First public release