How to filter a dataframe in Python

Learn how to filter a Python DataFrame with our guide. Discover methods, tips, real-world uses, and how to debug common errors.

How to filter a dataframe in Python
Published on: 
Fri
Feb 20, 2026
Updated on: 
Mon
Apr 6, 2026
The Replit Team

To work with data in Python, you need to filter pandas DataFrames. This fundamental skill for data analysis lets you isolate specific information for further review or computation.

Here, you’ll learn powerful techniques to filter data, from basic operators to the query() method. You'll also get practical tips, see real-world applications, and receive advice to debug common errors.

Basic filtering with boolean conditions

import pandas as pd

data = {
'Name': ['John', 'Emma', 'Bob', 'Lisa', 'Mike', 'Sarah'],
'Age': [28, 24, 32, 19, 40, 35],
'City': ['New York', 'Boston', 'Chicago', 'Boston', 'New York', 'Chicago']
}
df = pd.DataFrame(data)

filtered_df = df[df['Age'] > 30]
print(filtered_df)--OUTPUT--Name Age City
2 Bob 32 Chicago
4 Mike 40 New York
5 Sarah 35 Chicago

The code filters the DataFrame using boolean indexing, similar to techniques used when creating a DataFrame in Python. The expression df['Age'] > 30 generates a pandas Series of boolean values—True for rows that meet the condition and False for those that don't.

This Series then acts as a mask. When you pass this mask back into the DataFrame, pandas selects only the rows corresponding to a True value. This is why the output contains only the people older than 30. It’s a highly efficient method for selecting data based on your criteria.

Intermediate filtering techniques

Beyond basic boolean indexing, you can create more powerful filters with .loc[], combine multiple conditions with & and |, or use the expressive .query() method.

Using .loc[] for label-based filtering

# Filter rows where Age > 30 and select specific columns
result = df.loc[df['Age'] > 30, ['Name', 'City']]
print(result)--OUTPUT--Name City
2 Bob Chicago
4 Mike New York
5 Sarah Chicago

The .loc[] accessor is a powerful tool for selecting data by labels. It lets you filter rows and select specific columns in a single, readable command. The accessor takes two arguments separated by a comma: the first specifies the rows, and the second specifies the columns.

  • The row condition, df['Age'] > 30, is the same boolean mask you saw earlier.
  • The second argument, ['Name', 'City'], is a list of column names you want to keep.

This approach makes your code more explicit and often easier to read than chaining multiple operations.

Using .query() method for string expressions

result = df.query('Age < 30 and City == "New York"')
print(result)--OUTPUT--Name Age City
0 John 28 New York

The query() method offers a more intuitive way to filter your data. Instead of chaining boolean conditions, you pass a single string that reads much like plain English. This approach can make your filtering logic cleaner and easier to understand at a glance.

  • The expression 'Age < 30 and City == "New York"' is evaluated directly on the DataFrame. Pandas interprets the column names and conditions within the string.
  • You can use standard logical operators like and and or, which is often more readable than using the bitwise operators & and |.

Combining multiple conditions with & and | operators

mask = (df['Age'] > 25) & (df['City'] == 'New York')
result = df[mask]
print(result)--OUTPUT--Name Age City
0 John 28 New York
4 Mike 40 New York

You can create more complex filters by combining multiple boolean conditions. Use the & operator for "and" and the | operator for "or". It's crucial to wrap each individual condition in parentheses because of Python's operator precedence rules.

  • The first condition, (df['Age'] > 25), finds people older than 25.
  • The second, (df['City'] == 'New York'), finds people in New York.

The & operator combines these, so only rows that satisfy both conditions are returned in the final DataFrame.

Advanced filtering approaches

To tackle more nuanced filtering tasks, you can leverage powerful methods for checking multiple values, applying custom logic, and searching within text data.

Using .isin() for membership tests

cities = ['Boston', 'Chicago']
result = df[df['City'].isin(cities)]
print(result)--OUTPUT--Name Age City
1 Emma 24 Boston
2 Bob 32 Chicago
3 Lisa 19 Boston
5 Sarah 35 Chicago

The .isin() method is your go-to for filtering rows based on a list of allowed values. It’s a clean alternative to chaining multiple | (or) conditions, especially when your list is long.

  • The expression df['City'].isin(cities) checks each entry in the 'City' column to see if it exists within the cities list.
  • This generates a boolean Series that you can use as a mask, efficiently selecting only the rows that match your criteria—in this case, anyone living in Boston or Chicago.

Filtering with custom functions using .apply()

def is_valid(row):
return row['Age'] >= 25 and row['City'].startswith('N')

result = df[df.apply(is_valid, axis=1)]
print(result)--OUTPUT--Name Age City
0 John 28 New York
4 Mike 40 New York

When your filtering logic gets too complex for a one-liner, the .apply() method is your best bet. It lets you use a custom function—in this case, is_valid—to evaluate each row based on your specific rules. This approach gives you maximum flexibility for intricate conditions, especially when combined with AI coding to generate these functions.

  • The crucial part is setting axis=1, which tells pandas to pass each row to your function.
  • Your function returns True for rows to keep and False for rows to discard, creating a boolean mask for filtering.

Using string methods with .str.contains()

result = df[df['Name'].str.contains('a')]
print(result)--OUTPUT--Name Age City
1 Emma 24 Boston
3 Lisa 19 Boston
5 Sarah 35 Chicago

The .str.contains() method is your tool for filtering based on text patterns. It’s ideal for finding rows where a string column contains a specific character or sequence of characters.

  • The .str accessor lets you apply string-processing methods to each element in the 'Name' column.
  • .contains('a') then checks each name for the letter 'a', returning True or False.

This creates a boolean mask that filters the DataFrame, keeping only the rows where the condition is met. It’s an efficient way to search for substrings within your data.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This environment helps you move from learning individual techniques to building complete applications with Replit's Agent 4.

Instead of piecing together methods like .query() and .isin(), you can describe the app you want to build, and the Agent will handle the code, databases, and deployment. For example, you could create practical tools that use the filtering logic you've just learned:

  • A customer segmenting tool that filters a user list based on multiple criteria like location and activity.
  • A content moderation utility that automatically flags comments containing specific keywords from a blocklist.
  • A data validation dashboard that applies custom rules to a dataset to find and display invalid entries.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

When filtering DataFrames, you'll likely run into a few common pitfalls that can cause unexpected results or errors.

  • Forgetting parentheses with & and | operators
  • A frequent mistake is leaving out the parentheses around individual conditions when using the & (and) or | (or) operators. Due to Python's operator precedence, expressions like df['Age'] > 25 & df['City'] == 'New York' will fail. Python tries to evaluate 25 & df['City'] first, which results in a TypeError.
  • Always wrap each condition in parentheses to ensure the comparisons are performed before the logical operators are applied. This forces the correct order of operations, producing the boolean Series you need for filtering.
  • Overlooking case sensitivity in .str.contains()
  • By default, the .str.contains() method is case-sensitive. A search for 'boston' won't match the value 'Boston', which can lead to incomplete or empty results. This is an easy detail to miss when you're focused on the logic of your filter.
  • To perform a case-insensitive search, simply add the argument case=False to the method. For example, df['City'].str.contains('boston', case=False) will correctly find all variations of the city name.
  • Handling missing values in filtering operations
  • Missing data, represented as NaN (Not a Number), can behave unexpectedly during filtering. When you apply a condition like df['Age'] > 30, any row with a NaN value in the 'Age' column will evaluate to False and be excluded from the result.
  • While this might be what you want, it's better to be explicit. Use the .isnull() and .notnull() methods to consciously decide how to handle missing data. This makes your code clearer and helps you avoid accidentally dropping important rows from your analysis.

Forgetting parentheses with & and | operators

Because of Python's operator precedence, you must wrap individual conditions in parentheses when using & or |. Forgetting them is a common error that results in a TypeError instead of a filtered DataFrame. The code below shows this mistake in action.

# This will raise an error
filtered_df = df[df['Age'] > 30 & df['City'] == 'New York']
print(filtered_df)

The expression fails because the & operator binds more tightly than >, causing an invalid comparison between the number 30 and the City column. See the corrected syntax below, which ensures each condition is evaluated separately.

# Correct: use parentheses around each condition
filtered_df = df[(df['Age'] > 30) & (df['City'] == 'New York')]
print(filtered_df)

By wrapping each condition in parentheses, you ensure they are evaluated independently before the logical & operator is applied. This creates two separate boolean Series—one for age and one for city—which are then correctly combined into a final mask. It’s a simple but crucial step to remember whenever you're building filters with multiple conditions using either the & or | operators, as it guarantees your logic executes as intended.

Overlooking case sensitivity in .str.contains()

A common pitfall with text filtering is forgetting that string comparisons are case-sensitive. When using .str.contains(), a search for a lowercase pattern like 'new' will fail to find capitalized words such as 'New York'. See this in action below.

# This won't find "New York" if searching for "new"
filtered_df = df[df['City'].str.contains('new')]
print(filtered_df)

The code returns an empty DataFrame because the default search is case-sensitive and can't match 'new' with 'New York'. The following example shows how to adjust your code to get the expected result.

# Case-insensitive search
filtered_df = df[df['City'].str.contains('new', case=False)]
print(filtered_df)

By adding the case=False argument, you tell the .str.contains() method to ignore capitalization. This makes your search for 'new' correctly match values like 'New York'. It’s an essential adjustment when working with text data from varied sources where casing isn't consistent. This simple change ensures your filters are robust and don't miss entries due to something as simple as a capital letter.

Handling missing values in filtering operations

Missing data, represented as NaN (Not a Number), can cause silent errors in your filters. When a boolean condition is applied, any row with a NaN value evaluates to False and is excluded, which might not be your intention. The following code shows how a row with a missing age disappears from the result.

import pandas as pd
df_with_na = df.copy()
df_with_na.loc[2, 'Age'] = None
filtered_df = df_with_na[df_with_na['Age'] > 25]
print(filtered_df)

Because Bob's age is set to None, the filter df_with_na['Age'] > 25 evaluates his row as False. This silently removes him from the result, which can skew your analysis. The code below shows a better approach.

# Handle missing values explicitly
filtered_df = df_with_na[df_with_na['Age'].fillna(0) > 25]
print(filtered_df)

By using fillna(0), you temporarily replace any NaN values with 0 before the comparison runs. This prevents rows with missing data from being silently dropped from your results. Instead, they're explicitly evaluated in the filter—in this case, 0 > 25 becomes False. This approach gives you direct control over how your filter handles incomplete data, making your analysis more robust and predictable, especially when working with datasets where missing values are common and requires techniques similar to handling multiple exceptions in Python.

Real-world applications

Beyond debugging, these filtering skills are essential for real-world tasks, from segmenting sales figures to analyzing trends in time series data.

Filtering sales data with the & operator

You can combine multiple conditions with the & operator to identify high-value segments in your sales data, such as products that are both expensive and popular.

import pandas as pd

sales_data = {
'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard', 'Mouse'],
'Category': ['Electronics', 'Electronics', 'Electronics', 'Computer', 'Computer', 'Computer'],
'Price': [1200, 800, 300, 250, 80, 25],
'Units_Sold': [50, 120, 75, 30, 45, 100]
}
sales_df = pd.DataFrame(sales_data)

# Filter high-performing products (price > 200 and units sold > 40)
high_performers = sales_df[(sales_df['Price'] > 200) & (sales_df['Units_Sold'] > 40)]
print(high_performers)

This code isolates products from the sales_df that meet two specific criteria, which would be especially useful when working with real sales data loaded through reading CSV files in Python. It creates a new DataFrame, high_performers, by applying a filter that requires products to satisfy both of the following conditions:

  • The price must be greater than 200 (sales_df['Price'] > 200).
  • The number of units sold must exceed 40 (sales_df['Units_Sold'] > 40).

The & operator ensures that only rows where both conditions are true are returned, making it a powerful tool for targeted data segmentation.

Analyzing time series data with .loc[]

The .loc[] accessor is also highly effective for time series data, letting you isolate specific date ranges or filter for conditions like ideal weather.

import pandas as pd

dates = pd.date_range(start='2023-01-01', periods=10, freq='D')
data = {
'Date': dates,
'Temperature': [20, 22, 19, 24, 25, 23, 18, 16, 20, 22],
'Humidity': [65, 70, 80, 75, 65, 60, 70, 80, 75, 60]
}
weather_df = pd.DataFrame(data)
weather_df.set_index('Date', inplace=True)

# Filter to find days with high temperature and low humidity
comfortable_days = weather_df.loc[(weather_df['Temperature'] >= 20) &
(weather_df['Humidity'] < 70)]
print(comfortable_days)

This example first prepares the data for time series analysis by setting the 'Date' column as the index with set_index(). With the index in place, the code uses .loc[] to filter the weather_df based on two conditions joined by the & operator.

  • The first condition selects days where Temperature is 20 or higher.
  • The second condition finds days where Humidity is less than 70.

Only rows that satisfy both rules are included in the final comfortable_days DataFrame, giving you a targeted subset of the original data that could be further analyzed or combined with other datasets using techniques for merging DataFrames in Python.

Get started with Replit

Turn your new filtering skills into a real application. Describe what you want to build to Replit Agent, like “a script to find high-performer products from a sales CSV” or “a tool that filters weather data for comfortable days.”

The Agent writes the code, tests for errors, and deploys your application. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.