Change Datatype of Column in a DataFrame

Back to Blog

Each column in a DataFrame has a data type (dtype). Some functions and methods expect columns in a specific data type, and therefore it is a common operation to convert the data type of columns. In this short how-to article, we will learn how to change the data type of a column in Pandas and PySpark DataFrames.

Pandas

In a Pandas DataFrame, we can check the data types of columns with the dtypes method.

df.dtypes

Name    string
City    string
Age     string
dtype: object

The astype function changes the data type of columns. Consider we have a column with numerical values but its data type is string. This is a serious issue because we cannot perform any numerical analysis on textual data.

df["Age"] = df["Age"].astype("int")

We just need to write the desired data type inside the astype function. Let’s confirm the changes by checking the data types again.

df.dtypes

Name    string
City    string
Age      int64
dtype: object

It is possible to change the data type of multiple columns in a single operation. The columns and their data types are written as key-value pairs in a dictionary.

df = df.astype({"Age": "int", "Score": "int"})

PySpark

In PySpark, we can use the cast method to change the data type.

from pyspark.sql.types import IntegerType
from pyspark.sql import functions as F

# first method
df = df.withColumn("Age", df.age.cast("int"))

# second method
df = df.withColumn("Age", df.age.cast(IntegerType()))

# third method
df = df.withColumn("Age", F.col("Age").cast(IntegerType()))

To change the data type of multiple columns, we can combine operations by chaining them.

df = df.withColumn("Age", df.age.cast("int")) \
       .withColumn("Score", df.age.cast("int"))

This question is also being asked as:

How to force all strings to floats?

People have also asked for:

Aporia Team

Sometimes, writing is a joint effort.

building a RAG app?

Read about Aporia’s AI Guardrails

Learn more

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles

How to Build an End-To-End ML Pipeline With Databricks & Aporia

How to Convert a Dictionary to a DataFrame

How to Delete Rows Based on Column Values in a DataFrame

How to Convert the Index of a DataFrame to a Column

How to Write a DataFrame to a CSV File

How to Sort a DataFrame by Values in a Column

How to Count the Frequency that a Value Occurs in a DataFrame Column

How to Count the NaN Values in a DataFrame

How to Change Column Data Types in a DataFrame

Pandas

PySpark

This question is also being asked as:

People have also asked for:

On this page

Related Articles