Aporia How to's

How to Drop Rows of Pandas DataFrame Whose Value in a Certain Column is NaN?

1 min read
drop rows with nan in specific column pandas pyspark

In this short how-to article, we will learn how to drop rows in Pandas and PySpark DataFrames that have a missing value in a certain column.

Drop Rows with Value NaN in Certain Column - Pandas DataFrame

Pandas

The rows that have missing values can be dropped by using the dropna function. In order to look for only a specific column, we need to use the subset parameter.

df = df.dropna(subset=["id"])

Or, using the inplace parameter:

df.dropna(subset=["id"], inplace=True)

PySpark

It is quite similar to how it is done in Pandas.

df = df.na.drop(subset=["id"])

For both PySpark and Pandas, in the case of checking multiple columns for missing values, you just need to write the additional column names inside the list passed to the subset parameter.

This question is also being asked as:

  • Exclude rows that have NAN value for a column
  • Removing rows where the object is NaN

People have also asked for:

Green Background

Control All your GenAI Apps in minutes