Dictionary is a built-in data structure of Python, which consists of key-value pairs. In this short how-to article, we will learn how to convert a dictionary to a DataFrame in Pandas and PySpark.
Pandas DataFrame from Dictionary .dict()
The DataFrame constructor can be used to create a DataFrame from a dictionary. The keys represent the column names and the dictionary values become the rows.
import pandas as pd
# create a dictionary
A = {
"name": ["John", "Jane"],
"age": [20, 24]
}
# convert to a DataFrame
df = pd.DataFrame(A)
PySpark DataFrame from Dictionary .dict()
Although there exist some alternatives, the most practical way of creating a PySpark DataFrame from a dictionary is to first convert the dictionary to a Pandas DataFrame and then converting it to a PySpark DataFrame.
import pandas as pd
spark = SparkSession.builder.getOrCreate()
# create a dictionary
A = {
"name": ["John", "Jane"],
"age": [20, 24]
}
# convert to a Pandas DataFrame
df = pd.DataFrame(A)
# from Pandas to PySpark
df_pyspark = spark.createDataFrame(df)
This question is also being asked as:
- Convert dict of scalars to Pandas DataFrame