Select Rows From a DataFrame Based on List of Values

Back to Blog

Selecting rows based on a condition is a common operation in data wrangling. In this short how-to article, we will learn how to use a list of values to select rows from Pandas and PySpark DataFrames.

Pandas

The isin method of Pandas can be used for selecting rows based on a list of conditions. We just need to write the conditions inside a Python list.

df = df[df.group.isin(["A","B","D"])]

If you are interested in selecting rows that are not in this list, you can either add a tilde (~) operator at the beginning or set the condition as False.

# Tilde operator
df = df[~df.group.isin(["A","B","D"])]

# False condition
df = df[df.group.isin(["A","B","D"])==False]

PySpark

PySpark has an isin method which works similar to that of Pandas.

df = df.filter(df.group.isin(["A","B","D"]))

Letter cases cause strings to be different in PySpark too. We can use the lower or upper function to standardize letter cases before searching for a substring.

# Tilde operator
df = df.filter(~df.group.isin(["A","B","D"]))

# False condition
df = df.filter(df.group.isin(["A","B","D"])==False)

This question is also being asked as:

Filter DataFrame rows if value in column is in a set list of values
Efficiently select rows that match one of several values in Pandas DataFrame

People have also asked for:

Aporia Team

Sometimes, writing is a joint effort.

building a RAG app?

Read about Aporia’s AI Guardrails

Learn more

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles

How to Build an End-To-End ML Pipeline With Databricks & Aporia

How to Convert a Dictionary to a DataFrame

How to Delete Rows Based on Column Values in a DataFrame

How to Convert the Index of a DataFrame to a Column

How to Write a DataFrame to a CSV File

How to Sort a DataFrame by Values in a Column

How to Count the Frequency that a Value Occurs in a DataFrame Column

How to Count the NaN Values in a DataFrame

How to Select Rows From a DataFrame Based on a List of Values

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles