🤜🤛 Aporia partners with Google Cloud to bring reliability and security to AI Agents  - Read more

Back to Blog
How-To

How to Select Rows From a DataFrame Based on a List of Values

Aporia Team Aporia Team 2 min read Sep 06, 2022

Selecting rows based on a condition is a common operation in data wrangling. In this short how-to article, we will learn how to use a list of values to select rows from Pandas and PySpark DataFrames.

select rows by condition list of values pandas pyspark dataframe

Pandas

The isin method of Pandas can be used for selecting rows based on a list of conditions. We just need to write the conditions inside a Python list.

df = df[df.group.isin(["A","B","D"])]

If you are interested in selecting rows that are not in this list, you can either add a tilde (~) operator at the beginning or set the condition as False.

# Tilde operator
df = df[~df.group.isin(["A","B","D"])]

# False condition
df = df[df.group.isin(["A","B","D"])==False]

PySpark

PySpark has an isin method which works similar to that of Pandas.

df = df.filter(df.group.isin(["A","B","D"]))

Letter cases cause strings to be different in PySpark too. We can use the lower or upper function to standardize letter cases before searching for a substring.

# Tilde operator
df = df.filter(~df.group.isin(["A","B","D"]))

# False condition
df = df.filter(df.group.isin(["A","B","D"])==False)

This question is also being asked as:

  • Filter DataFrame rows if value in column is in a set list of values
  • Efficiently select rows that match one of several values in Pandas DataFrame

People have also asked for:

Rate this article

Average rating 1 / 5. Vote count: 1

No votes so far! Be the first to rate this post.

Slack

On this page

Blog
Building a RAG app?

Consider AI Guardrails to get to production faster

Learn more
Table of Contents

Related Articles