Aporia How to's

How to Show the Distinct Column Values in a DataFrame?

1 min read
Get Distinct Column Values in a DataFrame - Pandas and Pyspark

The number of distinct values of an attribute (i.e. column) can be important in data analytics, visualization, or modeling. In this short how-to article, we will learn how to find the distinct values in columns of Pandas and PySpark DataFrames.

How to Show the Distinct Column Values in a DataFrame?

Pandas

The unique function returns an array that contains the distinct values in a column whereas the nunique function gives us the number of distinct values.

# distinct values
df["Brand"].unique()

# number of distinct values
df["Brand"].nunique()

PySpark

We can see the distinct values in a column using the distinct function as follows:

df.select("name").distinct().show()

To count the number of distinct values, PySpark provides a function called countDistinct.

from pyspark.sql import functions as F

df.select(F.countDistinct("name")).show()

This question is also being asked as:

  • Number of unique elements in all columns of a PySpark DataFrame

People have also asked for:

Green Background

Control All your GenAI Apps in minutes