Aporia has been acquired by Coralogix, instantly bringing AI security and reliability to thousands of enterprises | Read the announcement

Back to Blog
How-To

How to Drop Rows of Pandas DataFrame Whose Value in a Certain Column is NaN

Dor Schwartz Dor Schwartz 1 min read Sep 06, 2022

In this short how-to article, we will learn how to drop rows in Pandas and PySpark DataFrames that have a missing value in a certain column.

Drop Rows with Value NaN in Certain Column - Pandas DataFrame

Pandas

The rows that have missing values can be dropped by using the dropna function. In order to look for only a specific column, we need to use the subset parameter.

df = df.dropna(subset=["id"])

Or, using the inplace parameter:

df.dropna(subset=["id"], inplace=True)

PySpark

It is quite similar to how it is done in Pandas.

df = df.na.drop(subset=["id"])

For both PySpark and Pandas, in the case of checking multiple columns for missing values, you just need to write the additional column names inside the list passed to the subset parameter.

This question is also being asked as:

  • Exclude rows that have NAN value for a column
  • Removing rows where the object is NaN

People have also asked for:

Rate this article

Average rating 5 / 5. Vote count: 2

No votes so far! Be the first to rate this post.

On this page

Blog
Building a RAG app?

Consider AI Guardrails to get to production faster

Learn more
Table of Contents

Related Articles