Back to Blog
How-To

How to Group DataFrame Rows into a List

Aporia Team Aporia Team 1 min read Sep 06, 2022

In this short how-to article, we will learn how to group DataFrame rows into a list in Pandas and PySpark. Groups will be based on the distinct values in a column. The values will be taken from another column and combined into a list.

Pandas Pyspark DataFrame Group Rows into List Using groupby()

Pandas

The rows are grouped using the groupby function and then we will apply the list constructor to the column that contains the values. We can perform this task as follows:

Members = df.groupby("Team", as_index=False).agg(
    Members = ("Member", list)
)

PySpark

To do this operation in PySpark, we can use the collect_list function along with the groupby.

from pyspark.sql import functions as F

Members = df.groupby("Team").agg(F.collect_list("Member"))

This question was also being asked as:

  • Pandas get all groupby values in an array

People have also asked for:

Rate this article

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

Slack

On this page

Blog
Building a RAG app?

Consider AI Guardrails to get to production faster

Learn more
Table of Contents

Related Articles