In this short how-to article, we will learn how to select multiple columns in Pandas and PySpark DataFrames.
Pandas
We can select multiple columns by writing them in a list.
cols = ["f2", "f4"]
df[cols]
The iloc method can be used for selecting columns based on their indices. Consider you have a DataFrame with 30 columns and you want to select the first 10. You can perform this task as follows:
# Select the first 10 columns
df.iloc[:,:10]
# Select from the second to fifth
df.iloc[:,2:5]
PySpark
The select function can be used for selecting multiple columns from a PySpark DataFrame.
# first method
df.select("f1", "f2")
# second method
df.select(df.f1, df.f2)
This question was also being asked as:
- How to choose specific columns in a DataFrame?