The most advanced ML Observability product in the market
Building an ML platform is nothing like putting together Ikea furniture; obviously, Ikea is way more difficult. However, they both, similarly, include many different parts that help create value when put together. As every organization sets out on a unique path to building its own machine learning platform, taking on the project of building a […]
Start integrating our products and tools.
We’re excited 😁 to share that Forbes has named Aporia a Next Billion-Dollar Company. This recognition comes on the heels of our recent $25 million Series A funding and is a huge testament that Aporia’s mission and the need for trust in AI are more relevant than ever. We are very proud to be listed […]
The number of distinct values of an attribute (i.e. column) can be important in data analytics, visualization, or modeling. In this short how-to article, we will learn how to find the distinct values in columns of Pandas and PySpark DataFrames.
The unique function returns an array that contains the distinct values in a column whereas the nunique function gives us the number of distinct values.
# distinct values df["Brand"].unique() # number of distinct values df["Brand"].nunique()
We can see the distinct values in a column using the distinct function as follows:
df.select("name").distinct().show()
To count the number of distinct values, PySpark provides a function called countDistinct.
from pyspark.sql import functions as F df.select(F.countDistinct("name")).show()