How to Group DataFrame Rows into a List?

Pandas Pyspark DataFrame Group Rows into List Using groupby()

In this short how-to article, we will learn how to group DataFrame rows into a list in Pandas and PySpark. Groups will be based on the distinct values in a column. The values will be taken from another column and combined into a list.

Pandas Pyspark DataFrame Group Rows into List Using groupby()

Pandas

The rows are grouped using the groupby function and then we will apply the list constructor to the column that contains the values. We can perform this task as follows:

				
					Members = df.groupby("Team", as_index=False).agg(
    Members = ("Member", list)
)
				
			

PySpark

To do this operation in PySpark, we can use the collect_list function along with the groupby.

				
					from pyspark.sql import functions as F

Members = df.groupby("Team").agg(F.collect_list("Member"))
				
			

This question was also being asked as:

  • Pandas get all groupby values in an array

People have also asked for:

You may also like

Start Monitoring Your Models in Minutes