How to Count NaN Values in a DataFrame

Back to Blog

NaN values are also called missing values and simply indicate the data we do not have. We do not like to have missing values in a dataset but it’s inevitable to have them in some cases.

The first step in handling missing values is to check how many they are. We often want to count the NaN values in a specific column to better understand the data.

This short how-to article will teach us how to count the missing values in Pandas and PySpark DataFrames.

How to Count the NaN Values in a DataFrame?

Pandas

We can use the isna or isnull function to detect missing values. They returned a DataFrame filled with boolean values (True or False) indicating the missing values. In order to count the missing values in each column separately, we need to use the sum function together with isna or isnull.

df.isna().sum()

f1    2
f2    2
f3    1
f4    0
dtype: int64

If we apply the sum function, we will get the number of the missing values in the DataFrame.

df.isna().sum().sum()
5

PySpark

We can count the NaN values in each column separately in PySpark. The functions to use are select, count, when, and isnan.

df.select(
    F.count(F.when(F.isnan("number")==True, F.col("number"))).alias("NaN_count")
).show()

+---------+
|NaN_count|
+---------+
|        2|
+---------+

The isnan function checks the condition of being NaN, the count, and when the functions count the rows in which the condition is True.

This question is also being asked as:

Python DataFrame get null value counts

People have also asked for:

Aporia Team

Sometimes, writing is a joint effort.

building a RAG app?

Read about Aporia’s AI Guardrails

Learn more

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles

How to Build an End-To-End ML Pipeline With Databricks & Aporia

How to Convert a Dictionary to a DataFrame

How to Delete Rows Based on Column Values in a DataFrame

How to Convert the Index of a DataFrame to a Column

How to Write a DataFrame to a CSV File

How to Sort a DataFrame by Values in a Column

How to Count the Frequency that a Value Occurs in a DataFrame Column

How to Create a DataFrame by Appending One Row at a Time

How to Count the NaN Values in a DataFrame

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles