The most advanced ML Observability product in the market
Building an ML platform is nothing like putting together Ikea furniture; obviously, Ikea is way more difficult. However, they both, similarly, include many different parts that help create value when put together. As every organization sets out on a unique path to building its own machine learning platform, taking on the project of building a […]
Start integrating our products and tools.
We’re excited 😁 to share that Forbes has named Aporia a Next Billion-Dollar Company. This recognition comes on the heels of our recent $25 million Series A funding and is a huge testament that Aporia’s mission and the need for trust in AI are more relevant than ever. We are very proud to be listed […]
In this short how-to article, we will learn how to group DataFrame rows into a list in Pandas and PySpark. Groups will be based on the distinct values in a column. The values will be taken from another column and combined into a list.
The rows are grouped using the groupby function and then we will apply the list constructor to the column that contains the values. We can perform this task as follows:
Members = df.groupby("Team", as_index=False).agg( Members = ("Member", list) )
To do this operation in PySpark, we can use the collect_list function along with the groupby.
from pyspark.sql import functions as F Members = df.groupby("Team").agg(F.collect_list("Member"))
How to Get a List of DataFrame Column Names?
How to Select Rows From a DataFrame Based on a List of Values?