How to convert JSON to a dataframe in Python

Discover how to convert JSON to a Python DataFrame. This guide covers different methods, tips, real-world uses, and debugging common errors.

Published on:

Tue

Mar 3, 2026

Updated on:

Mon

Apr 13, 2026

The Replit Team

ON THIS PAGE

Example H2

To convert JSON to a DataFrame is a common task in data analysis. Python's powerful libraries simplify this process for developers who work with structured data formats.

In this article, you'll learn several techniques for the conversion. We'll cover practical tips, real-world applications, and offer advice to debug common issues you might face.

Using `pd.read_json()` with a simple JSON string

import pandas as pd json_data = '{"Name":["John","Anna"], "Age":[28,22], "City":["New York","Paris"]}' df = pd.read_json(json_data) print(df)--OUTPUT--Name Age City 0 John 28 New York 1 Anna 22 Paris

The pd.read_json() function is the most direct method for this task. It’s built to parse a JSON string and map its structure into a DataFrame automatically. The function assumes a common "column-oriented" format, which is exactly what we have here.

The JSON keys—"Name", "Age", and "City"—are interpreted as the column headers.
The corresponding arrays of values are used to populate the rows under each column.

This approach is highly memory-efficient for this data structure because it aligns perfectly with how DataFrames are organized, requiring no extra configuration.

Basic JSON to DataFrame conversion techniques

Beyond simple strings, pandas can also create a DataFrame from data already in a Python dictionary, read from a file, or structured as a list.

Converting a Python dictionary to DataFrame

import pandas as pd import json json_str = '{"Name":"John", "Age":28, "City":"New York"}' data = json.loads(json_str) df = pd.DataFrame([data]) print(df)--OUTPUT--Name Age City 0 John 28 New York

When your JSON data is already a string in your code, you can first parse it into a Python dictionary using json.loads(). This step converts the JSON object into a native Python structure that pandas can easily work with, making accessing dictionary data in Python crucial for data manipulation.

From there, you can create a DataFrame directly using the pd.DataFrame() constructor. The key is how you pass the data:

Wrapping the dictionary in a list, like [data], signals to pandas that the dictionary represents a single row.

Pandas then uses the dictionary's keys as column headers and its values to populate the cells for that row.

Reading JSON from a file-like object

import pandas as pd import io # Creating a file-like object with JSON data json_file = io.StringIO('{"Name":["John","Anna"], "Age":[28,22]}') df = pd.read_json(json_file) print(df)--OUTPUT--Name Age 0 John 28 1 Anna 22

The pd.read_json() function isn't limited to file paths; it can also read from any file-like object. Here, io.StringIO creates an in-memory text buffer from a string. Pandas then treats this buffer just like a regular file on your disk.

This technique is especially useful when you get JSON data from a source that isn't a file, such as an API response.
You can process the data directly without needing to save it to a temporary file first, making your code more efficient.

Converting a list of dictionaries to DataFrame

import pandas as pd data = [ {"Name": "John", "Age": 28, "City": "New York"}, {"Name": "Anna", "Age": 22, "City": "Paris"} ] df = pd.DataFrame(data) print(df)--OUTPUT--Name Age City 0 John 28 New York 1 Anna 22 Paris

Passing a list of dictionaries to the pd.DataFrame() constructor is a common and highly readable approach. Pandas intelligently maps this structure—which is often how data is returned from APIs—directly into a table.

Each dictionary in the list is interpreted as a single row in the DataFrame.
The dictionary keys are automatically used as the column headers for the entire dataset.

This method is efficient because the data's structure already mirrors the desired DataFrame.

Advanced JSON to DataFrame techniques

While the basic methods work well for flat data, real-world JSON is often more complex, requiring tools like json_normalize() for nested structures and custom parsing. This complexity is where AI coding with Python becomes particularly valuable.

Handling nested JSON with `json_normalize()`

import pandas as pd from pandas import json_normalize nested_json = { "users": [ {"name": "John", "info": {"age": 28, "city": "New York"}}, {"name": "Anna", "info": {"age": 22, "city": "Paris"}} ] } df = json_normalize(nested_json['users']) print(df)--OUTPUT--name info.age info.city 0 John 28 New York 1 Anna 22 Paris

When your JSON has dictionaries inside other dictionaries, the standard pd.DataFrame() constructor falls short. That’s where json_normalize() shines. It’s a specialized function built to flatten complex, nested JSON into a usable DataFrame.

It navigates into the data, starting with the list you specify—in this case, nested_json['users'].
The function then unpacks the nested info dictionary for each user.
It automatically creates new column names by joining the keys, resulting in columns like info.age and info.city.

Working with JSON lines (JSONL) format

import pandas as pd # Each line is a separate JSON object jsonl_data = '''{"Name":"John", "Age":28} {"Name":"Anna", "Age":22}''' df = pd.read_json(jsonl_data, lines=True) print(df)--OUTPUT--Name Age 0 John 28 1 Anna 22

JSON Lines (JSONL) is a format where each line in a text stream is a self-contained JSON object. It's often used for logging or streaming data. The pd.read_json() function can parse this format directly, but you need to give it a specific instruction.

The key is setting the lines=True parameter.
This tells pandas to treat each line as a separate JSON entity to be parsed.
Each line is then converted into a row in the final DataFrame, making it a simple way to handle this common data structure.

Customizing JSON parsing with `orient` parameter

import pandas as pd json_records = '[{"Name":"John","Age":28},{"Name":"Anna","Age":22}]' df = pd.read_json(json_records, orient='records') print(df)--OUTPUT--Name Age 0 John 28 1 Anna 22

The orient parameter in pd.read_json() is a powerful tool for handling various JSON structures. It tells pandas how to interpret the data's layout, ensuring it's parsed correctly into a DataFrame. This flexibility is crucial since not all JSON is formatted the same way.

Setting orient='records' specifies that the JSON is structured as a list of dictionaries, where each dictionary represents a single row.

This is incredibly useful because many APIs return data in this exact format. It allows you to load the data directly without needing to restructure it first, making your code cleaner and more efficient.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. You don’t need to worry about environment configuration or package management.

Knowing how to convert JSON to a DataFrame is a great skill, but Agent 4 helps you take the next step. Instead of piecing together techniques, you can build complete applications directly from a description.

A dashboard that pulls data from a JSON API, converts it to a DataFrame, and displays key user metrics.
A data cleaning utility that reads a JSONL file, flattens nested objects, and exports the clean table.
A configuration parser that reads a complex JSON file and transforms it into a simple, flat structure for an app to use.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Converting JSON isn't always straightforward; you'll often encounter tricky data inconsistencies that require specific solutions, making handling multiple exceptions in Python essential for robust data processing.

Handling inconsistent data types in JSON

One common issue is when a single JSON key contains values of different types across different records—for example, an ID that is sometimes a number and sometimes a string. When pandas encounters this, it defaults to the generic object data type for the entire column, which is less memory-efficient and slower for computations.

To fix this, you can enforce consistency after loading the data. A good approach is to use pd.to_numeric() on the column, setting the errors='coerce' parameter. This will attempt to convert all values to a numeric type and automatically replace any that can't be converted, like "N/A", with NaN (Not a Number).

Fixing datetime parsing with `read_json()`

While read_json() can automatically detect and convert some date formats, it often struggles with non-standard ones, leaving your date columns as simple strings. This prevents you from performing time-based operations, like calculating durations or filtering by a date range.

The most reliable solution is to handle the conversion manually. First, load the data with the date column as a string. Then, use the powerful pd.to_datetime() function, providing a format argument that matches your specific date structure (e.g., format='%Y-%m-%d %H:%M:%S'). This gives you precise control and ensures the conversion is successful.

Dealing with missing values in JSON data

JSON has several ways to represent missing data, including the null value, an empty string (""), or simply omitting a key-value pair entirely. While pandas correctly converts null to NaN, it treats empty strings and omitted keys differently, leading to an inconsistent representation of missing values in your DataFrame.

Your goal should be to standardize all missing data to NaN. You can achieve this by using the replace() method on your DataFrame to change empty strings or other placeholders to np.nan. Once all missing values are standardized, you can handle them uniformly with functions like fillna() to insert a default value or dropna() to remove the incomplete records.

Handling inconsistent data types in JSON

Inconsistent data types, such as an age field containing both numbers and strings, are a frequent issue. Pandas defaults the column to the object type to handle the mix, which will cause a TypeError during mathematical operations. See it in action below.

import pandas as pd json_data = '''[ {"name": "John", "age": 30}, {"name": "Anna", "age": "25"} ]''' df = pd.read_json(json_data) print(df.dtypes) print(df['age'].sum()) # This will fail with TypeError

The sum() operation fails because Python can't add an integer (30) to a string ("25"). This mismatch within the age column is what causes the TypeError. The code below shows how to correct it.

import pandas as pd json_data = '''[ {"name": "John", "age": 30}, {"name": "Anna", "age": "25"} ]''' df = pd.read_json(json_data) df['age'] = pd.to_numeric(df['age']) print(df.dtypes) print(df['age'].sum()) # Now works correctly: 55

The solution is to explicitly convert the column to a numeric type using pd.to_numeric(). This function standardizes all values, allowing mathematical operations like sum() to execute without errors.

This issue often appears when your JSON data comes from multiple sources or user inputs, where type consistency isn't guaranteed.

Fixing datetime parsing with `read_json()`

When read_json() encounters dates, it often defaults to treating them as plain text strings. This becomes a problem when you try to filter by date, as string comparison doesn't work chronologically. The following code shows this issue in practice.

import pandas as pd json_data = '''[ {"event": "Conference", "date": "2023-05-15"}, {"event": "Workshop", "date": "2023-06-20"} ]''' df = pd.read_json(json_data) print(df.dtypes) print(df[df['date'] > '2023-06-01']) # String comparison, may be unreliable

Since the date column contains strings, the > operator performs an alphabetical comparison. This is unreliable for chronological filtering. The following code demonstrates the proper way to handle this conversion.

import pandas as pd json_data = '''[ {"event": "Conference", "date": "2023-05-15"}, {"event": "Workshop", "date": "2023-06-20"} ]''' df = pd.read_json(json_data, convert_dates=['date']) print(df.dtypes) print(df[df['date'] > pd.to_datetime('2023-06-01')]) # Proper date comparison

The fix is to use the convert_dates parameter within pd.read_json(). This tells pandas which columns to automatically parse as datetime objects upon loading, which is far more efficient than converting them after the fact.

This ensures comparisons like df['date'] > pd.to_datetime('2023-06-01') work chronologically.
It's crucial when your JSON contains date strings you'll need for filtering or time-based analysis.

This simple parameter avoids unreliable string comparisons and makes your date handling robust from the start.

Dealing with missing values in JSON data

Missing data is a common headache. JSON uses null for empty values, which pandas converts to NaN (Not a Number). While this seems fine, it can silently break calculations like finding an average. The code below shows what happens.

import pandas as pd json_data = '''[ {"name": "John", "score": 85}, {"name": "Anna", "score": null}, {"name": "Mike", "score": 92} ]''' df = pd.read_json(json_data) average = df['score'].mean() print(f"Average score: {average}") # NaN values affect calculation

By default, mean() skips the NaN value, leading to an average that doesn't account for the missing record. This can silently misrepresent your data. The following code demonstrates a proper way to handle it.

import pandas as pd json_data = '''[ {"name": "John", "score": 85}, {"name": "Anna", "score": null}, {"name": "Mike", "score": 92} ]''' df = pd.read_json(json_data) average = df['score'].dropna().mean() print(f"Average score: {average}") # Properly handles missing values

The solution is to explicitly handle missing values before your calculation. Chaining .dropna() before calling .mean() removes all NaN entries, ensuring the average is calculated only from valid data. This makes your intent clear and prevents pandas from silently ignoring records.

It's a crucial step when your JSON comes from sources like APIs or databases where some fields might be optional.

Real-world applications

Beyond fixing errors, these skills are essential for practical tasks like pulling data from APIs and analyzing complex datasets.

Parsing JSON API responses with `requests`

You can easily pull data from an API using the requests library and load the JSON response directly into a pandas DataFrame for immediate analysis. For more details on calling APIs in Python, you can explore various request methods and authentication patterns.

import pandas as pd import requests response = requests.get('https://jsonplaceholder.typicode.com/users') users_data = response.json() users_df = pd.DataFrame(users_data) print(users_df[['id', 'name', 'email']].head(3))

This snippet shows a powerful, two-step process for handling API data. The requests.get() function first fetches the raw JSON from the URL. Then, the response.json() method decodes it into a clean Python list of dictionaries.

Pandas takes over with pd.DataFrame(), which seamlessly converts that list into a structured table.
The final line simply selects a few key columns and uses .head(3) to preview the first three entries, confirming the data loaded correctly. This structured approach is similar to techniques used when reading CSV files in Python.

Analyzing weather data with `pd.read_json()`

You can also use pd.read_json() to load a dataset and perform quick aggregations, such as calculating temperature statistics for different cities.

import pandas as pd weather_json = '''[ {"city": "New York", "date": "2023-01-01", "temp": 32, "condition": "Snow"}, {"city": "New York", "date": "2023-01-02", "temp": 35, "condition": "Cloudy"}, {"city": "Miami", "date": "2023-01-01", "temp": 75, "condition": "Sunny"}, {"city": "Miami", "date": "2023-01-02", "temp": 78, "condition": "Sunny"} ]''' weather_df = pd.read_json(weather_json) weather_stats = weather_df.groupby('city')['temp'].agg(['mean', 'min', 'max']) print(weather_stats)

This code demonstrates a common data analysis workflow. After pd.read_json() creates the DataFrame, the real work begins with a powerful chain of commands.

The groupby('city') method organizes the data into separate groups for "New York" and "Miami".
From each group, it selects the temp column.
Finally, .agg(['mean', 'min', 'max']) calculates the mean, minimum, and maximum temperature for each city.

This is an efficient way to summarize data across different categories in just one line, and once you have your DataFrame structured, you might want to export it by converting JSON to CSV in Python for sharing or further analysis.

Get started with Replit

Put your new skills into practice. Describe a tool to Replit Agent, like "a script that parses a nested JSON config file into a flat DataFrame" or "an app that converts JSONL logs to a CSV."

Replit Agent writes the code, tests for errors, and deploys the app. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Follow @Replit