class plydata.two_table_verbs.semi_join(*args, **kwargs)[source]

Join and keep columns only found in left frame & no duplicate rows

A semi join differs from an inner join because an inner join will return one row of left frame for each matching row of the right, where a semi join will never duplicate rows of the left frame.


Left dataframe


Right dataframe

onstr or tuple or list

Columns on which to join. Must be found in both DataFrames.

left_onlabel or list, or array-like

Field names to join on in left DataFrame. Can be a vector or list of vectors of the length of the DataFrame to use a particular vector as the join key instead of columns

right_onlabel or list, or array-like

Field names to join on in right DataFrame or vector/list of vectors per left_on docs

suffixes2-length sequence

Suffix to apply to overlapping column names in the left and right side, respectively.


Groups are ignored for the purpose of joining, but the result preserves the grouping of x.


>>> import pandas as pd
>>> df1 = pd.DataFrame({
...     'col1': ['one', 'two', 'three'],
...     'col2': [1, 2, 3]
... })
>>> df2 = pd.DataFrame({
...     'col1': ['one', 'four', 'three', 'three'],
...     'col2': [1, 4, 3, 3]
... })
>>> semi_join(df1, df2, on='col1')
    col1  col2
0    one     1
2  three     3

Compared to an inner_join

>>> inner_join(df1, df2, on='col1')
    col1  col2_x  col2_y
0    one       1       1
1  three       3       3
2  three       3       3