Aporia How to's

How to Group DataFrame Rows into a List?

1 min read
Pandas Pyspark DataFrame Group Rows into List Using groupby()

In this short how-to article, we will learn how to group DataFrame rows into a list in Pandas and PySpark. Groups will be based on the distinct values in a column. The values will be taken from another column and combined into a list.

Pandas Pyspark DataFrame Group Rows into List Using groupby()

Pandas

The rows are grouped using the groupby function and then we will apply the list constructor to the column that contains the values. We can perform this task as follows:

Members = df.groupby("Team", as_index=False).agg(
    Members = ("Member", list)
)

PySpark

To do this operation in PySpark, we can use the collect_list function along with the groupby.

from pyspark.sql import functions as F

Members = df.groupby("Team").agg(F.collect_list("Member"))

This question was also being asked as:

  • Pandas get all groupby values in an array

People have also asked for:

Green Background

Control All your GenAI Apps in minutes