How to convert JSON to a dataframe in Python
Discover multiple ways to convert JSON to a Python DataFrame. Get tips, see real-world applications, and learn to debug common errors.

The conversion of JSON to a pandas DataFrame is a fundamental step for data scientists. Python provides robust libraries that make this transformation straightforward for structured analysis and manipulation.
In this article, you'll explore methods to convert JSON to a DataFrame. You will find practical tips for nested data, see real-world applications, and get debugging advice for common errors.
Using pd.read_json() with a simple JSON string
import pandas as pd
json_data = '{"Name":["John","Anna"], "Age":[28,22], "City":["New York","Paris"]}'
df = pd.read_json(json_data)
print(df)--OUTPUT--Name Age City
0 John 28 New York
1 Anna 22 Paris
The pd.read_json() function offers a direct path for converting a JSON string into a DataFrame. It automatically parses the string and maps its structure to a tabular format. The key to its simplicity here is the JSON's structure—it's a dictionary where each key represents a column name and its value is a list of the data for that column.
This column-oriented format is a default that pandas recognizes, which is why the conversion works seamlessly without extra parameters. The function intelligently aligns the data from each list into its corresponding column, producing a clean DataFrame ready for analysis.
Basic JSON to DataFrame conversion techniques
While pd.read_json() is perfect for strings, your data might also come as a Python dictionary, a list of dictionaries, or from a file.
Converting a Python dictionary to DataFrame
import pandas as pd
import json
json_str = '{"Name":"John", "Age":28, "City":"New York"}'
data = json.loads(json_str)
df = pd.DataFrame([data])
print(df)--OUTPUT--Name Age City
0 John 28 New York
When your JSON represents a single object, you first need to parse it into a Python dictionary using json.loads(). This step makes the data workable for pandas.
- The key to this conversion is passing the dictionary to
pd.DataFrame()inside a list, as in[data].
This simple trick tells pandas to interpret the dictionary as one row. The dictionary's keys are used as column headers, and its values populate that single row of data.
Reading JSON from a file-like object
import pandas as pd
import io
# Creating a file-like object with JSON data
json_file = io.StringIO('{"Name":["John","Anna"], "Age":[28,22]}')
df = pd.read_json(json_file)
print(df)--OUTPUT--Name Age
0 John 28
1 Anna 22
Sometimes your JSON data isn't in a file on your disk but exists as a string in your program, perhaps from an API call. The pd.read_json() function can handle this elegantly by reading from a file-like object.
- The
io.StringIO()function creates an in-memory text buffer from your string. - This buffer behaves just like a real file, which allows
pd.read_json()to process it directly without needing to save anything to your computer's storage. It's a memory-efficient way to handle string-based data.
Converting a list of dictionaries to DataFrame
import pandas as pd
data = [
{"Name": "John", "Age": 28, "City": "New York"},
{"Name": "Anna", "Age": 22, "City": "Paris"}
]
df = pd.DataFrame(data)
print(df)--OUTPUT--Name Age City
0 John 28 New York
1 Anna 22 Paris
When your data is a list of dictionaries, creating a DataFrame is straightforward. This format is common for API responses, and pandas handles it natively when you pass the list directly to the pd.DataFrame() constructor.
- Each dictionary in the list is interpreted as a single row in the
DataFrame. - The dictionary keys are automatically used as the column headers, with their corresponding values filling the cells.
This method is highly intuitive because the structure of your list directly maps to the resulting table.
Advanced JSON to DataFrame techniques
When the basic methods aren't enough, you can use advanced tools like json_normalize() and the orient parameter to parse nested data and other complex JSON formats.
Handling nested JSON with json_normalize()
import pandas as pd
from pandas import json_normalize
nested_json = {
"users": [
{"name": "John", "info": {"age": 28, "city": "New York"}},
{"name": "Anna", "info": {"age": 22, "city": "Paris"}}
]
}
df = json_normalize(nested_json['users'])
print(df)--OUTPUT--name info.age info.city
0 John 28 New York
1 Anna 22 Paris
When your JSON has dictionaries inside other dictionaries, pd.DataFrame() can struggle. This is where json_normalize() becomes essential. It's built to flatten these nested structures into a usable, tabular format.
- The function unpacks nested objects, like the
infodictionary here. - It then creates new columns by joining the parent and child keys with a dot, giving you clear headers like
info.ageandinfo.city.
Working with JSON lines (JSONL) format
import pandas as pd
# Each line is a separate JSON object
jsonl_data = '''{"Name":"John", "Age":28}
{"Name":"Anna", "Age":22}'''
df = pd.read_json(jsonl_data, lines=True)
print(df)--OUTPUT--Name Age
0 John 28
1 Anna 22
The JSON Lines (JSONL) format is useful for streaming data, as each line is a separate JSON object. It's a straightforward way to handle large datasets without loading everything at once. Pandas can parse this format directly using pd.read_json().
- The key is the
lines=Trueargument. This tells pandas to treat each line as an individual record. - The function then reads each JSON object and converts it into a row, assembling them all into a single, clean
DataFrame.
Customizing JSON parsing with orient parameter
import pandas as pd
json_records = '[{"Name":"John","Age":28},{"Name":"Anna","Age":22}]'
df = pd.read_json(json_records, orient='records')
print(df)--OUTPUT--Name Age
0 John 28
1 Anna 22
The orient parameter in pd.read_json() gives you control over how pandas interprets your JSON's structure. It's how you tell the function what format to expect, which is crucial when the data isn't in the default column-based layout.
Here, orient='records' signals that the JSON is a list of dictionaries.
- With this setting, pandas treats each dictionary as a single row.
- The dictionary keys are automatically used as column headers, and their values populate the cells.
This is a common format for API responses, making orient='records' a practical option for real-world data.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
For the JSON conversion techniques we've explored, Replit Agent can turn them into production-ready tools. For example, you could:
- Build a dashboard that pulls live JSON data from a public API and displays it in a structured table.
- Create a data cleaning utility that uses
json_normalize()to flatten nested JSON from user profiles into a clean DataFrame. - Deploy a log analyzer that processes JSONL files from application events and organizes them into a searchable format.
Describe your app idea, and Replit Agent will write the code, test it, and handle deployment automatically. Try building your next project with Replit Agent.
Common errors and challenges
Even with powerful tools, converting JSON can introduce subtle errors related to data types, dates, and missing values.
Handling inconsistent data types in JSON
JSON doesn't enforce strict data types, so a single key might have values that are a mix of numbers and strings. When pandas encounters this, it often defaults the entire column to the generic object dtype, which is less efficient for analysis.
- You can inspect column types with
df.dtypesto identify these mixed-type columns. - After loading, use functions like
pd.to_numeric()to convert columns to their proper type. Theerrors='coerce'argument is useful for turning any values that fail conversion intoNaN.
Fixing datetime parsing with read_json()
By default, read_json() will likely interpret date strings as plain text, not as actual datetime objects. This prevents you from performing time-based operations, like sorting or calculating durations.
- The
convert_datesparameter inread_json()lets you specify which columns to parse as dates during the import. - Alternatively, you can convert columns after loading using
pd.to_datetime(), which is highly flexible and can parse many date formats.
Dealing with missing values in JSON data
Missing information in JSON can appear as a null value, an empty string, or an omitted key. Pandas handles null well, automatically converting it to NaN (Not a Number), its standard marker for missing data.
- The challenge is with other representations, like an empty string, which pandas will import as a literal string.
- To standardize your data, use the
df.replace()method to convert other placeholders, like empty strings, intoNaNfor consistent handling.
Handling inconsistent data types in JSON
A common pitfall with JSON is inconsistent data types within the same field—for example, an age column containing both integers and strings. This forces pandas to use a generic object dtype, preventing mathematical operations. See what happens in the following example.
import pandas as pd
json_data = '''[
{"name": "John", "age": 30},
{"name": "Anna", "age": "25"}
]'''
df = pd.read_json(json_data)
print(df.dtypes)
print(df['age'].sum()) # This will fail with TypeError
The sum() method fails because the string "25" forces the entire age column into a generic object dtype. This prevents mathematical operations. The code below shows how to correct this by ensuring the column is numeric.
import pandas as pd
json_data = '''[
{"name": "John", "age": 30},
{"name": "Anna", "age": "25"}
]'''
df = pd.read_json(json_data)
df['age'] = pd.to_numeric(df['age'])
print(df.dtypes)
print(df['age'].sum()) # Now works correctly: 55
The fix is to explicitly convert the column to a numeric type. This standardizes the data, resolving the TypeError and enabling mathematical operations like sum() to work correctly.
- You can do this by applying the
pd.to_numeric()function to the entire column.
This is a common cleanup step when your JSON source doesn't enforce strict data types, sometimes mixing numbers and strings within the same field.
Fixing datetime parsing with read_json()
Date strings in JSON are a common tripwire. By default, pandas’ read_json() function imports them as generic objects, not true datetime values. This blocks you from performing reliable time-based analysis, like filtering for dates after a certain point.
The following code shows what happens when you try to filter dates stored as strings. While it might appear to work, string comparison is not a dependable way to handle chronological data.
import pandas as pd
json_data = '''[
{"event": "Conference", "date": "2023-05-15"},
{"event": "Workshop", "date": "2023-06-20"}
]'''
df = pd.read_json(json_data)
print(df.dtypes)
print(df[df['date'] > '2023-06-01']) # String comparison, may be unreliable
Using a simple string comparison like df['date'] > '2023-06-01' is risky because it sorts alphabetically, not chronologically, which can lead to incorrect filtering. See the correct way to handle this in the code below.
import pandas as pd
json_data = '''[
{"event": "Conference", "date": "2023-05-15"},
{"event": "Workshop", "date": "2023-06-20"}
]'''
df = pd.read_json(json_data, convert_dates=['date'])
print(df.dtypes)
print(df[df['date'] > pd.to_datetime('2023-06-01')]) # Proper date comparison
The correct approach is to tell pandas to treat the date strings as actual datetime objects during the import process. This ensures your data is ready for time-based analysis from the moment it's loaded.
- The
convert_dates=['date']argument inread_json()handles this conversion for you automatically. - This gives the column the correct data type, enabling true chronological comparisons.
- You can then reliably filter and analyze your data without the risk of string-based sorting errors.
Dealing with missing values in JSON data
JSON uses null to represent missing information, which pandas helpfully converts to NaN (Not a Number) during import. While this standardizes missing data, it can interfere with mathematical operations like calculating an average. The following code demonstrates this issue.
import pandas as pd
json_data = '''[
{"name": "John", "score": 85},
{"name": "Anna", "score": null},
{"name": "Mike", "score": 92}
]'''
df = pd.read_json(json_data)
average = df['score'].mean()
print(f"Average score: {average}") # NaN values affect calculation
The mean() function returns NaN if it encounters any missing values, rendering the average score useless. The following example demonstrates how to compute a meaningful average from this data.
import pandas as pd
json_data = '''[
{"name": "John", "score": 85},
{"name": "Anna", "score": null},
{"name": "Mike", "score": 92}
]'''
df = pd.read_json(json_data)
average = df['score'].dropna().mean()
print(f"Average score: {average}") # Properly handles missing values
To get a meaningful average, you first need to handle the NaN values. Chaining the dropna() method before mean() tells pandas to ignore the missing entries before performing the calculation.
- This approach ensures your statistical functions aren't skewed by missing data.
It's a crucial step when working with real-world datasets where some fields are often optional or incomplete, giving you a more accurate result.
Real-world applications
With the fundamentals and error-handling covered, you can apply these techniques to real-world scenarios like parsing API data and analyzing weather information.
Parsing JSON API responses with requests
A common workflow involves using the requests library to fetch data from an API, which you can then load directly into a pandas DataFrame for analysis.
import pandas as pd
import requests
response = requests.get('https://jsonplaceholder.typicode.com/users')
users_data = response.json()
users_df = pd.DataFrame(users_data)
print(users_df[['id', 'name', 'email']].head(3))
This code demonstrates a complete pipeline from an API request to a formatted DataFrame preview. The requests.get() function first fetches data from the specified URL.
- The
response.json()method efficiently decodes the API's JSON output into a Python list of dictionaries. - This list is then directly converted into a
DataFrameusingpd.DataFrame(). - Finally, it selects the
id,name, andemailcolumns and prints the first three rows with.head(3)for a concise summary.
Analyzing weather data with pd.read_json()
With pd.read_json(), you can load weather data and use pandas' grouping features to compare statistics like temperature across multiple locations.
import pandas as pd
weather_json = '''[
{"city": "New York", "date": "2023-01-01", "temp": 32, "condition": "Snow"},
{"city": "New York", "date": "2023-01-02", "temp": 35, "condition": "Cloudy"},
{"city": "Miami", "date": "2023-01-01", "temp": 75, "condition": "Sunny"},
{"city": "Miami", "date": "2023-01-02", "temp": 78, "condition": "Sunny"}
]'''
weather_df = pd.read_json(weather_json)
weather_stats = weather_df.groupby('city')['temp'].agg(['mean', 'min', 'max'])
print(weather_stats)
This code snippet shows how you can quickly summarize data from a JSON string. After loading the data into a DataFrame, it uses groupby('city') to sort the weather entries by location. This creates distinct groups for "New York" and "Miami".
- The
.agg()function is then used on thetempcolumn for each city. - It efficiently computes multiple statistics at once—in this case, the mean, minimum, and maximum temperatures—giving you a clean summary table.
Get started with Replit
Put your new skills to use by building a real tool. Give Replit Agent a prompt like "Build a tool that converts a JSONL file to a CSV" or "Create a web app that flattens nested JSON into a table."
The agent writes the code, tests for errors, and deploys your application automatically. Start building with Replit today.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



.png)