How to Drop Rows with Missing (NaN) Value in Certain Column

Back to Blog

In this short how-to article, we will learn how to drop rows in Pandas and PySpark DataFrames that have a missing value in a certain column.

Pandas

The rows that have missing values can be dropped by using the dropna function. In order to look for only a specific column, we need to use the subset parameter.

df = df.dropna(subset=["id"])

Or, using the inplace parameter:

df.dropna(subset=["id"], inplace=True)

PySpark

It is quite similar to how it is done in Pandas.

df = df.na.drop(subset=["id"])

For both PySpark and Pandas, in the case of checking multiple columns for missing values, you just need to write the additional column names inside the list passed to the subset parameter.

This question is also being asked as:

Exclude rows that have NAN value for a column
Removing rows where the object is NaN

People have also asked for:

Dor Schwartz

Dor is the Community and Partnerships Manager at Aporia.

building a RAG app?

Read about Aporia’s AI Guardrails

Learn more

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles

How to Build an End-To-End ML Pipeline With Databricks & Aporia

How to Convert a Dictionary to a DataFrame

How to Delete Rows Based on Column Values in a DataFrame

How to Convert the Index of a DataFrame to a Column

How to Write a DataFrame to a CSV File

How to Sort a DataFrame by Values in a Column

How to Count the Frequency that a Value Occurs in a DataFrame Column

How to Count the NaN Values in a DataFrame

How to Drop Rows of Pandas DataFrame Whose Value in a Certain Column is NaN

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles