In this short how-to article, we will learn how to rename a column in Pandas and PySpark DataFrames.
Pandas
The rename function can be used for renaming the columns.
# Rename one columns
df = df.rename(columns={"date": "purchase_date"})
# Rename multiple columns
df = df.rename(columns={"date": "purchase_date", "qty": "quantity"})
Or, using the inplace parameter:
df.rename(columns={"date": "purchase_date"}, inplace=True)
PySpark
The withColumnRenamed function is used for renaming columns in a PySpark DataFrame.
# Rename one column
df = df.withColumnRenamed("date", "purchase_date")
# Multiple columns
df = df.withColumnRenamed("date", "purchase_date").withColumnRenamed("qty", "quantity")
This question is also being asked as:
- How to change DataFrame column names in PySpark?
- Renaming column names in Pandas