How to find missing values in Python
Learn how to find missing values in Python. This guide covers various methods, tips, real-world applications, and common error debugging.

Missing values are a common challenge in data analysis that can distort your results. Python provides powerful, straightforward tools to identify and handle these gaps, which ensures your conclusions remain accurate.
In this article, you'll learn several techniques to locate missing values. You'll find practical tips, real-world applications, and debugging advice to help you choose the right method for your dataset.
Using is None to check for missing values
data = [1, None, 3, None, 5]
missing_indices = [i for i, x in enumerate(data) if x is None]
print(f"Missing values found at indices: {missing_indices}")--OUTPUT--Missing values found at indices: [1, 3]
The simplest way to find missing data is by checking for Python's built-in None object. The identity operator, is None, is the most reliable method for this because it directly checks if a variable points to the single None object in memory. It's more explicit and often faster than using the equality operator ==.
The example uses a list comprehension to efficiently build a new list. It iterates through the data list with enumerate() to get both the index and value of each item, then collects the indices where the value is None.
Basic techniques with pandas
While is None is effective for lists, the pandas library offers more powerful tools for finding missing values in structured data.
Using isna() to identify missing values
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None]})
print(df.isna())--OUTPUT--A B
0 False False
1 True False
2 False True
This example demonstrates how creating DataFrames in Python with missing values allows you to practice data cleaning techniques.
The isna() method is a powerful tool for detecting missing data across an entire DataFrame. It returns a boolean DataFrame where each cell is marked True if the original value was missing and False otherwise. This gives you a quick, comprehensive map of all the gaps in your dataset.
- It recognizes multiple types of missing values, including NumPy's
np.nan. - It also correctly identifies Python's built-in
Nonetype.
This flexibility is why isna() is a go-to for data cleaning in pandas.
Counting missing values with sum()
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None], 'C': [7, 8, 9]})
missing_count = df.isna().sum()
print(missing_count)--OUTPUT--A 1
B 1
C 0
dtype: int64
After identifying missing values with isna(), you can chain the sum() method to get a quick count. This is a common pandas idiom that works because sum() treats boolean True values as 1 and False values as 0.
- By default,
sum()operates column-wise, giving you a total count of missing entries for each column. - This provides a concise summary that helps you quickly gauge which parts of your dataset need cleaning.
Finding rows with any missing values
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None]})
rows_with_missing = df[df.isna().any(axis=1)]
print(rows_with_missing)--OUTPUT--A B
1 NaN 5.0
2 3.0 NaN
To isolate entire rows that contain missing data, you can chain the any() method after isna(). This technique is useful for quickly filtering out incomplete records from your dataset.
- The
any(axis=1)method scans across each row (sinceaxis=1specifies the horizontal axis) and returnsTrueif it finds at least one missing value. - This creates a boolean Series that you can use to select and display only the rows from the original DataFrame that have gaps.
Advanced missing value analysis
Once you've identified where the gaps are, you can move beyond simple counts to analyze their impact and uncover patterns in their distribution.
Getting missing value percentage
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan], 'B': [4, 5, None, 7]})
missing_percentage = df.isna().mean() * 100
print(missing_percentage)--OUTPUT--A 50.0
B 25.0
dtype: float64
Calculating the percentage of missing values often provides more context than a raw count. You can do this by chaining the mean() method after isna(). Because pandas treats boolean True values as 1 and False as 0, calling mean() directly calculates the proportion of missing data in each column.
- Simply multiply the result by 100 to convert the proportion into an easy-to-read percentage.
- This gives you a normalized measure of data quality, which is useful for comparing the completeness of different columns, especially when handling large datasets in Python.
Using dropna() to find complete data
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, None]})
complete_data = df.dropna()
print(f"Complete rows: {len(complete_data)} out of {len(df)}")
print(complete_data)--OUTPUT--Complete rows: 1 out of 3
A B
0 1.0 4.0
The dropna() method provides a straightforward way to isolate complete records. It works by removing NaN values in Python - specifically, removing any row from your DataFrame that contains at least one missing value, such as np.nan or None.
- This gives you a clean subset of your data, which is ideal for analyses that can't handle gaps.
- Be mindful that this approach can discard a significant amount of information if missing values are widespread.
Finding patterns in missing data
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3, np.nan],
'B': [np.nan, 5, np.nan, 8],
'C': [7, 8, np.nan, np.nan]})
missing_correlation = df.isna().corr()
print(missing_correlation)--OUTPUT--A B C
A 1.000000 -1.000000 0.000000
B -1.000000 1.000000 0.000000
C 0.000000 0.000000 1.000000
You can uncover hidden relationships in your dataset by analyzing the correlation of missing values. Chaining isna() with corr() creates a matrix that reveals how the absence of data in one column relates to another. This technique pairs well with vibe coding to quickly explore patterns and understand if missing data is random or systematic.
- A correlation of
1means two columns tend to have missing values in the same rows. - A correlation of
-1, as seen between columnsAandB, means if a value is missing in one, it's present in the other. - A value near
0suggests there's no discernible pattern.
Move faster with Replit
Replit is an AI-powered development platform where you can skip setup and start coding instantly. It comes with all Python dependencies pre-installed, so you can go straight from learning new techniques to applying them.
With Agent 4, you can move from piecing together individual functions to building complete applications. Instead of just identifying missing data with isna(), you can ask the Agent to build a finished tool:
- A data quality dashboard that scans a CSV and reports the percentage of missing values in each column.
- A data cleaning utility that filters out records with missing contact information from a customer list.
- A validation tool that analyzes patterns in missing data, like checking if an incomplete user profile also lacks transaction history.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with powerful tools, a few common pitfalls can trip you up when handling missing data in Python.
- Mistakenly using
==instead ofiswithNone: Using the equality operator==to check forNonecan lead to unpredictable results. While it often works, custom objects can override the==operator to behave in unexpected ways. The identity operatoris, however, cannot be overridden. It guarantees you're checking if a variable is the one trueNoneobject, making your code more robust and reliable. - Incorrectly detecting
np.nanwith equality operator: A unique and often surprising property of NumPy's not-a-number value,np.nan, is that it never equals itself. This means a check likex == np.nanwill always returnFalse, even ifxisnp.nan. This is why you must use dedicated functions like pandas'isna()or NumPy'sisnan(), which are designed to correctly identify these special floating-point values. - Losing
NaNvalues during filtering operations: Be careful when changing a column's data type. SinceNaNis technically a float, trying to convert a column containingNaNvalues to an integer type will raise an error in standard pandas. If you're not prepared for this, you might filter out rows or make other changes that inadvertently cause data loss. It's best practice to handle missing values before attempting such type conversions.
Mistakenly using == instead of is with None
It's tempting to use the == operator to find None values, and in many simple scenarios, it appears to work just fine. However, this approach isn't guaranteed to be accurate across the board, making your code less predictable. The following example demonstrates this common practice.
data = [1, None, 3, None, 5]
missing_indices = [i for i, x in enumerate(data) if x == None]
print(f"Missing values found at indices: {missing_indices}")
While this code works for a simple list, using the == operator is risky. Custom objects can define their own equality rules, causing this check to fail unexpectedly. The following example shows the correct, reliable approach.
data = [1, None, 3, None, 5]
missing_indices = [i for i, x in enumerate(data) if x is None]
print(f"Missing values found at indices: {missing_indices}")
This solution uses the identity operator, is None, which is the most reliable way to find missing values because it checks if a variable is the one true None object. The equality operator, ==, can be misleading since custom objects can change its behavior. Always use is None to make your code more predictable and robust, especially when you're working with data from different sources or complex object types.
Incorrectly detecting np.nan with equality operator
Unlike Python's None, NumPy's special np.nan value has a peculiar property: it never equals itself. This means using the equality operator, ==, to find missing data will silently fail, often returning an empty list. The code below shows this common pitfall in action.
import numpy as np
data = [1, np.nan, 3, np.nan, 5]
missing_indices = [i for i, x in enumerate(data) if x == np.nan]
print(f"Missing values found at indices: {missing_indices}") # Returns []
The expression x == np.nan always returns False, so the list comprehension's condition is never met, resulting in an empty list. The following example demonstrates the proper way to detect these values.
import numpy as np
data = [1, np.nan, 3, np.nan, 5]
missing_indices = [i for i, x in enumerate(data) if isinstance(x, float) and np.isnan(x)]
print(f"Missing values found at indices: {missing_indices}") # Returns [1, 3]
The solution uses NumPy's isnan() function, which is designed to correctly identify np.nan values. Since isnan() requires a number, the code first checks if an item is a float using isinstance(x, float). This two-step check ensures you can reliably find np.nan without errors. You'll encounter this most often when cleaning numerical datasets, especially those imported into pandas, where np.nan is the standard marker for missing numeric data.
Losing NaN values during filtering operations
It's easy to accidentally lose data when filtering a DataFrame. If you apply a condition to a column with np.nan values, those rows might disappear because comparisons like np.nan > 2 always evaluate to False. The following code shows this in action.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
filtered = df[df['A'] > 2]
print(filtered) # NaN row is lost because NaN > 2 is False
The boolean mask created by df['A'] > 2 is False for the np.nan entry, causing that row to be excluded from the output. The following code demonstrates a more robust approach to filtering.
import pandas as pd
import numpy as np
df = pd.DataFrame({'A': [1, np.nan, 3], 'B': [4, 5, 6]})
mask = (df['A'] > 2) | (df['A'].isna())
filtered = df[mask]
print(filtered) # Both NaN and values > 2 are kept
The solution prevents data loss by building a more inclusive filter. It uses the | (OR) operator to combine two conditions: one for your data (df['A'] > 2) and another to explicitly keep missing values (df['A'].isna()). This ensures rows with NaN aren't silently dropped. You should use this technique whenever filtering columns that might contain missing data, as it's a common way to lose valuable information without realizing it.
Real-world applications
Beyond just identifying gaps, these methods are crucial for practical tasks like imputing customer data or building custom quality reports. When working with real datasets, you'll often start by reading CSV files in Python before applying these missing value techniques. AI coding with Python can further enhance these data processing workflows.
Using fillna() to impute missing values in customer data
When dropping incomplete records isn't an option, the fillna() method allows you to impute missing values with a calculated substitute, such as the column's average, to keep your customer dataset whole.
import pandas as pd
import numpy as np
# Customer dataset with missing values
customers = pd.DataFrame({
'age': [25, np.nan, 34, np.nan, 41],
'income': [50000, 60000, np.nan, 75000, 65000]
})
# Fill missing values with mean
customers_imputed = customers.fillna(customers.mean())
print(customers_imputed)
This example shows how to replace missing data with a calculated value. The key is chaining two pandas methods: mean() and fillna().
- First,
customers.mean()computes the average for each column, automatically skipping over anynp.nanvalues. - Then,
fillna()takes the resulting averages and plugs them into the empty cells of their respective columns.
This approach ensures you're substituting missing ages with the average age and missing incomes with the average income, preserving the integrity of each feature while being memory-efficient.
Creating a custom missing value report for data quality assessment
You can create a custom data quality report by combining isna() and sum() with basic arithmetic to show the exact count and percentage of missing values in each column.
import pandas as pd
import numpy as np
# Sales dataset with missing values
sales = pd.DataFrame({
'product': ['A', 'B', 'C', np.nan, 'E'],
'units_sold': [100, 150, np.nan, 200, 175],
'revenue': [1000, np.nan, 1500, 2000, 1750]
})
# Create a missing value report
missing_counts = sales.isna().sum()
missing_percent = (missing_counts / len(sales) * 100).round(1)
report = pd.DataFrame({'Missing Count': missing_counts, 'Percent': missing_percent})
print(report)
This snippet creates a concise data quality summary. It leverages familiar methods to build a new DataFrame that reports on missing values.
- It calculates the percentage of missing data by dividing the
isna().sum()count by the total number of rows, then usesround(1)for a clean look. - It then constructs a new DataFrame, organizing the raw counts and calculated percentages into two columns—
'Missing Count'and'Percent'—for a clear, side-by-side comparison.
Get started with Replit
Now, turn your knowledge into a real tool. Ask Replit Agent to “build a data quality dashboard from a CSV” or “create a utility that filters out records with missing contact information.”
The Agent will write the code, test for errors, and help you deploy the app. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.



