Changelog¶
v0.4.2¶
(2020-09-12)
This is release makes Plydata depend on pandas < 1.1.0. See Issue 23 for details.
v0.4.1¶
(2020-06-10)
v0.4.0¶
(2020-03-15)
New Features¶
Added
gather
to transform dataframe from wide-form to long-form.Added
spread
to transform dataframe from long-form to wide-formAdded
separate
to split a string variable/ column into different variables/columns.Added
extract
which uses a regular expression with groups to extract one or more variables different columns.Added
pivot_wider
to transform dataframe from long-form to wide-form. This is a more general version ofspread
.Added
pivot_longer
to transform dataframe from wide-form to long-form. This is a more general version ofgather
.Added
separate_rows
to split multiple delimited values and place each one in its own row.Added
unite
to join multiple columns into one.Added
cat_inorder
which creates a categorical with categories in order of how they appear in the sequence.Added
cat_infreq
which creates a categorical with categories in order of the number of times they appear in the sequence.Added
cat_inseq
which creates a categorical with categories in ascending numerical order.Added
cat_reorder
which creates a categorical with categories ordered according to another variable.Added
cat_reorder2
which creates a categorical with categories ordered according a relationship between two other variables.Added
cat_rev
which creates a categorical with reversed categories.Added
cat_shuffle
which creates a categorical with the categories in a random order.Added
cat_shift
which creates a categorical with the categories shifted to the left or to the right.Added
cat_move
(cat_relevel
) which creates a categorical with the categories moved to a given position.Added
cat_anon
which creates a categorical with the categories renamed and reordered with arbitrary numeric identifiers.Added
cat_collapse
which creates a categorical with new umbrella categories that combine one or more of the original categories.Added
cat_other
which creates a categorical with a new umbrella category that combines one or more of the original categories.Added
cat_lump
which lumps together most/least common categories.Added
cat_lump_min
which lumps together common enough categories.Added
cat_rename
with which you can manually change category names (and values).Added
cat_relabel
to change category names using a function.Added
cat_expand
to add or remove categories to a categorical.Added
cat_explicit_na
to create a category for missing values.Added
cat_remove_unsed
to remove/drop unused categories.Added
cat_unify
to unify (union of all) the categories in a list of categoricals.Added
cat_concat
to concantenate categoricals and combine the categories.Added
cat_zip
to combine two or more categoricals.Added
ply
function. Makes it possible to use plydata with implied piping without abusing the>>
operator. It is also more efficient as it minimises the copying of data.Added
cat_lump_n
,cat_lump_prop
, andcat_lump_lowfreq
as the distinct cases ofcat_lump
.
v0.3.1¶
(2017-11-21)
Fixed exception with evaluation of grouped categorical columns when there are missing categories in the data.
Fixed issue with ignored groups when
count
andadd_count
are used with a grouped dataframe. The groups list in the verb call were ignored.Fixed issue where a dataframe with a column named n, the column could not be referenced (GH6).
v0.3.0¶
(2017-11-03)
Fixed
define
(mutate) andcreate
(transmute), make them work withgroup_by
.Fixed
tally
to work with external arrays.Fixed
tally
to sort in descending order.Fixed the
nth
function ofsummarize
to return NaN when the requested value is out of bounds.The
contains
andmatches
parameters ofselect
can now accept atuple
of values.Fixed verbs that create columns (i.e
create
,define
anddo
) so that they can create categorical columns.The
join
verbs gained left_on and right_on parameters.Fixed verb reuse. You can create a verb and assign it to a variable and pipe to the same variable in different operations.
Fixed issue where
select
does maintain the order in which the columns are listed.
New Features¶
Added special verb
call
, it allows one to use external functions that accept a dataframe as the first argument.For
define
,create
andgroup_by
, you can now use the special functionn()
to count the number of elements in current group.Added the single table helper verbs:
Added
pull
verb.Added
slice_rows
verb.
API Changes¶
Using internal function for
summarize
that counts the number of elements in the current group changed from{n}
ton()
.You can now use piping with the two table verbs (the joins).
modify_where
anddefine_where
helper verbs have been removed. Using the new expression helper functionscase_when
andif_else
is more readable.Removed
dropna
andfillna
in favour of usingcall
withpandas.DataFrame.dropna()
andpandas.DataFrame.fillna()
.
v0.2.1¶
(2017-09-20)
Fixed issue with
do
andsummarize
where the categorical group columns are not categorical in the result.Fixed issue with internal modules being imported with
from plydata import *
.Added
dropna
andfillna
verbs. They both wrap around pandas methods of the same name. Now you man maintain the pipelining when dealing with mostNaN
values.
v0.2.0¶
(2017-05-06)
distinct
now uses pandas.unique instead ofnumpy.unique()
.Added function
Q()
for quote non-pythonic column names in a dataframe.Fixed
query
andmodify_where
query expressions to handle environment variables.Added
options
context manager.Fixed bug where some verbs were not reusable. e.g.
data = pd.DataFrame({'x': range(5)}) v = define(y='x*2') df >> v # first use df >> v # Reuse of v
Added
define_where
verb, a combination ofdefine
andmodify_where
.