There might be some redundant columns in a DataFrame or we might just not need some columns for the task at hand. In this short how-to article, we will learn how to delete a column from Pandas and PySpark DataFrames.
Pandas
We can use the drop function to delete a column or multiple columns from a DataFrame.
# delete one column
df = df.drop("NO", axis=1)
# delete multiple columns
df = df.drop(["f1", "f2"], axis=1)
In the case of deleting multiple columns, column names need to be written in a list.
PySpark
PySpark DataFrame has a drop method to delete single or multiple columns.
# delete one column
df = df.drop("NO")
# delete multiple columns
df = df.drop("f1", "f2")
This question is also being asked as:
- Selecting/excluding sets of columns in Pandas