How to Show the Distinct Column Values in a DataFrame?

Get Distinct Column Values in a DataFrame - Pandas and Pyspark

The number of distinct values of an attribute (i.e. column) can be important in data analytics, visualization, or modeling. In this short how-to article, we will learn how to find the distinct values in columns of Pandas and PySpark DataFrames.

How to Show the Distinct Column Values in a DataFrame?

Pandas

The unique function returns an array that contains the distinct values in a column whereas the nunique function gives us the number of distinct values.

				
					# distinct values
df["Brand"].unique()

# number of distinct values
df["Brand"].nunique()
				
			

PySpark

We can see the distinct values in a column using the distinct function as follows:

				
					df.select("name").distinct().show()


				
			

To count the number of distinct values, PySpark provides a function called countDistinct.

				
					from pyspark.sql import functions as F

df.select(F.countDistinct("name")).show()
				
			

This question is also being asked as:

  • Number of unique elements in all columns of a PySpark DataFrame

People have also asked for:

You may also like

Start Monitoring Your Models in Minutes