Prompt engineering sucks. Break free from the endless tweaking with this revolutionary approach  - Learn more

Securing AI systems is tricky, ignoring it is risky. Discover the easiest way to secure your AI end to end  - Learn more

Back to Blog
How-To

How to Show the Distinct Column Values in a DataFrame

Aporia Team Aporia Team 1 min read Sep 06, 2022

The number of distinct values of an attribute (i.e. column) can be important in data analytics, visualization, or modeling. In this short how-to article, we will learn how to find the distinct values in columns of Pandas and PySpark DataFrames.

How to Show the Distinct Column Values in a DataFrame?

Pandas

The unique function returns an array that contains the distinct values in a column whereas the nunique function gives us the number of distinct values.

# distinct values
df["Brand"].unique()

# number of distinct values
df["Brand"].nunique()

PySpark

We can see the distinct values in a column using the distinct function as follows:

df.select("name").distinct().show()

To count the number of distinct values, PySpark provides a function called countDistinct.

from pyspark.sql import functions as F

df.select(F.countDistinct("name")).show()

This question is also being asked as:

  • Number of unique elements in all columns of a PySpark DataFrame

People have also asked for:

Rate this article

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

On this page

Blog
Building a RAG app?

Consider AI Guardrails to get to production faster

Learn more
Table of Contents

Related Articles