How to index a DataFrame in Python
Learn to index a pandas DataFrame in Python. This guide covers methods, tips, real-world applications, and how to debug common errors.

To index a pandas DataFrame is to select specific rows and columns for analysis. It's a fundamental skill for data manipulation in Python, allowing you to access subsets of your data efficiently.
In this article, you'll learn key indexing techniques with .loc and .iloc. We'll cover practical tips, real-world applications, and common debugging advice to help you master DataFrame selection and filtering.
Basic indexing with .iloc and .loc
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])
# Basic row and column selection
print(df.loc['row2', 'B'])--OUTPUT--5
The example creates a DataFrame with custom row labels—'row1', 'row2', and 'row3'—using the index parameter. This setup is crucial for understanding label-based indexing, which is what .loc is designed for.
The .loc indexer selects data by these explicit labels. When you use df.loc['row2', 'B'], you're telling pandas to find the value at the intersection of the row named 'row2' and the column named 'B'. This method is powerful because it's readable and isn't affected by the DataFrame's integer position, making your code more robust.
Common indexing methods
Building on that basic selection, you can access data in three main ways: with simple brackets, label-based indexing via .loc, and integer-based indexing via .iloc.
Selecting columns with bracket notation
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# Select a single column
column_a = df['A']
# Select multiple columns
subset = df[['A', 'C']]
print(subset)--OUTPUT--A C
0 1 7
1 2 8
2 3 9
Bracket notation is the most direct way to select columns. To grab a single column, you just pass its name as a string, like df['A']. This operation returns a pandas Series containing the column's data.
- For multiple columns, you'll need to use double brackets. The syntax
df[['A', 'C']]works because the inner brackets create a list of column names, which is then passed to the outer selection brackets. - This returns a new DataFrame with only the specified columns.
Using .loc[] for label-based indexing
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])
# Select rows and columns by label
subset = df.loc['row1':'row2', 'A':'B']
print(subset)--OUTPUT--A B
row1 1 4
row2 2 5
The .loc indexer is also powerful for slicing data by labels. In the expression df.loc['row1':'row2', 'A':'B'], you're carving out a rectangular section of the DataFrame. The slice before the comma selects the rows, and the slice after it selects the columns.
- A key difference from integer-based slicing is that
.locis inclusive. It includes both the starting label ('row1') and the ending label ('row2') in the output.
Using .iloc[] for integer-based indexing
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]})
# Select rows and columns by position
subset = df.iloc[0:2, 1:3]
print(subset)--OUTPUT--B C
0 5 9
1 6 10
The .iloc indexer selects data based on integer positions, just like standard Python lists. It's your go-to for positional access. In the expression df.iloc[0:2, 1:3], you're slicing the DataFrame by its numerical index, not its labels.
- The
0:2part selects rows at positions 0 and 1. Following Python's convention, the slice is exclusive of the end index. - Similarly,
1:3selects columns at positions 1 and 2, which correspond to columns 'B' and 'C' in this case.
This makes .iloc ideal when you need to work with data based on its order rather than its explicit name.
Advanced indexing techniques
Beyond basic slicing, you can tackle complex filtering with boolean indexing, manage layered data with MultiIndex, and write expressive queries with the .query() method.
Boolean indexing for filtering data
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# Filter rows where column A is greater than 2
filtered_df = df[df['A'] > 2]
print(filtered_df)--OUTPUT--A B
2 3 7
3 4 8
Boolean indexing lets you filter a DataFrame based on a condition. The core of this technique is an expression that returns `True` or `False` for each row, like df['A'] > 2.
- This creates a boolean Series that acts as a mask. When you pass this mask back into the DataFrame, pandas returns only the rows where the condition was `True`. It’s a highly efficient and readable way to select subsets of your data based on its values.
Working with MultiIndex for hierarchical data
import pandas as pd
import numpy as np
arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second'))
df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['data1', 'data2'])
print(df.loc[('A', 'one')])--OUTPUT--data1 0.123456
data2 0.789012
Name: (A, one), dtype: float64
A MultiIndex lets you handle hierarchical data by creating multiple index levels within a single DataFrame. The code uses pd.MultiIndex.from_arrays() to build a two-level index from a list of lists, naming the levels 'first' and 'second'. This is a powerful way to organize complex datasets without creating extra columns.
To select data, you simply pass a tuple of labels to .loc. The expression df.loc[('A', 'one')] targets the row where the 'first' level is 'A' and the 'second' level is 'one', making it easy to navigate the data's layers.
Using .query() method for SQL-like filtering
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})
# Use query method for filtering
result = df.query('A > 1 and B < 6')
print(result)--OUTPUT--A B C
1 2 5 8
The .query() method offers a more expressive, SQL-like way to filter your DataFrame. Instead of chaining boolean conditions with brackets, you pass a single string that specifies your filtering logic. This approach often makes your code cleaner and easier to read, especially when dealing with multiple criteria.
- In the example,
df.query('A > 1 and B < 6')evaluates the string as a condition. It selects rows where the value in column 'A' is greater than 1 and the value in column 'B' is less than 6, returning a new filtered DataFrame.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.
The indexing techniques you've learned are the building blocks for powerful data tools. With Replit Agent, you can turn these concepts into production applications:
- Build an interactive sales dashboard that filters customer data using expressive conditions with the
.query()method. - Create a data extraction utility that pulls specific records from a large dataset using
.locfor label-based lookups. - Deploy a financial analysis tool that navigates hierarchical market data with
MultiIndexto generate custom reports.
Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.
Common errors and challenges
Even experienced developers run into indexing issues; here’s how to navigate the most common ones you'll face.
Handling KeyError when using incorrect index types with .loc
A KeyError is one of the most frequent errors you'll encounter. It pops up when you try to access a row or column label with .loc that doesn't exist in the DataFrame's index.
For example, if you have row labels 'row1' and 'row2', asking for df.loc['row3'] will trigger this error because 'row3' isn't a valid key. Always double-check that your labels match the DataFrame's index exactly, including capitalization and data type.
Fixing syntax errors when selecting multiple columns
A subtle syntax mistake often trips people up when selecting more than one column. You might instinctively write df['A', 'C'], but this will result in an error.
The correct way is to use double brackets: df[['A', 'C']]. This works because the inner brackets create a Python list of the column names you want, and the outer brackets pass that list to the DataFrame for selection.
Avoiding the SettingWithCopyWarning with chained indexing
The SettingWithCopyWarning isn't an error, but it's a crucial message from pandas. It suggests that you might be modifying a temporary copy of your data instead of the original DataFrame, so your changes won't stick.
This often happens with "chained indexing," where you select and assign in two separate steps, like df[df['A'] > 2]['B'] = 0. Pandas can't guarantee whether this modifies the original df or a copy.
- To fix this, use
.locto perform both the row selection and column assignment in a single operation. The correct, reliable syntax isdf.loc[df['A'] > 2, 'B'] = 0. This ensures you're always working directly on the original DataFrame.
Handling KeyError when using incorrect index types with .loc
A KeyError doesn't just happen with missing labels. It also occurs when you give .loc an integer position instead of the label it expects. This common mix-up happens because .loc is strictly for label-based indexing. The code below shows this error in action.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])
# This causes a KeyError because .loc expects labels, not integer positions
result = df.loc[0, 'A']
print(result)
Using df.loc[0, 'A'] causes a KeyError because the index contains string labels, such as 'row1'. The integer 0 isn't a valid label in this index, so the lookup fails. See the correct syntax for positional access below.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3'])
# Fixed: Use the correct label with .loc or switch to .iloc for integer positions
result = df.loc['row1', 'A'] # Using correct label
# Alternative: result = df.iloc[0, 0] # Using position-based indexing
print(result)
The fix boils down to a simple rule: use the right tool for the job. You'll want to use df.loc['row1', 'A'] because .loc exclusively works with labels. If you need to select by integer position, you must switch to df.iloc[0, 0]. This mix-up is especially common when your DataFrame has a custom, non-numeric index. Always match your indexer—.loc or .iloc—to your selection type, whether it's a label or a position.
Fixing syntax errors when selecting multiple columns
A frequent KeyError stems from a subtle syntax mistake when selecting multiple columns. You might intuitively write df['A', 'B'], but pandas interprets this as a single tuple key, which typically fails. The code below demonstrates this common pitfall.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# This raises a KeyError - incorrect syntax for multiple column selection
columns = df['A', 'B']
print(columns)
The expression df['A', 'B'] fails because pandas searches for a single column named with the tuple ('A', 'B'), not two separate columns. This mismatch triggers the KeyError. The corrected syntax is shown in the next example.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Fixed: Use a list inside brackets to select multiple columns
columns = df[['A', 'B']]
print(columns)
The fix is to wrap the column names in an extra set of brackets: df[['A', 'B']]. This works because you're passing a list of strings to the DataFrame's selection operator. The inner brackets create the list, and the outer brackets perform the selection. This error often appears when you're trying to create a subset of your data for analysis. Just remember to use double brackets whenever you need more than one column.
Avoiding the SettingWithCopyWarning with chained indexing
This warning isn't an error, but it's a critical message from pandas. It appears when you use chained indexing—like df[...]df[...]—to modify data, creating ambiguity over whether you're changing the original DataFrame or a temporary copy. The code below shows this in action.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# This can produce a SettingWithCopyWarning and unpredictable results
subset = df[df['A'] > 2]
subset['B'] = 0
print(df) # May or may not be modified
The operation is split into two steps. First, df[df['A'] > 2] creates a new DataFrame. Then, you modify this new object, leaving the original df untouched. See the correct, single-step approach in the code below.
import pandas as pd
df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]})
# Fixed: Use .loc for clear and predictable modifications
df.loc[df['A'] > 2, 'B'] = 0
print(df)
The fix is to use .loc to combine row and column selection into a single, unambiguous operation. The expression df.loc[df['A'] > 2, 'B'] = 0 tells pandas exactly which rows to filter and which column to modify all at once. This guarantees your changes apply directly to the original DataFrame, not a temporary copy. You'll want to watch for this warning whenever you filter and then try to assign a new value in a separate step.
Real-world applications
With a solid handle on debugging, you can confidently apply these indexing skills to real-world tasks like sales analysis and customer segmentation.
Analyzing monthly sales data with .loc[]
You can use .loc to easily slice time-series data, such as pulling out all sales records from the first quarter of the year.
import pandas as pd
# Monthly sales data for 2023
dates = pd.date_range('2023-01-01', periods=6, freq='M')
sales = pd.DataFrame({
'amount': [12000, 15000, 18000, 17000, 19000, 22000]
}, index=dates)
# Extract Q1 sales data
q1_sales = sales.loc['2023-01-31':'2023-03-31']
print(q1_sales)
This example showcases how to filter time-series data. It starts by creating a DatetimeIndex using pd.date_range(), which generates six month-end dates to serve as the DataFrame's index.
- The core of the operation is the slice
sales.loc['2023-01-31':'2023-03-31']. - Since the index consists of dates, you can use date strings to define the start and end points of your selection.
- The
.locindexer includes both the start and end labels, so this expression effectively pulls all records within that three-month window.
Using boolean indexing and .loc[] for customer segmentation
You can combine multiple conditions with boolean indexing and .loc[] to perform targeted customer segmentation, like isolating high-value members for a promotion.
import pandas as pd
# Customer purchase data
customers = pd.DataFrame({
'customer_id': [101, 102, 103, 104, 105],
'purchases': [5, 10, 3, 20, 7],
'membership': ['Silver', 'Gold', 'Bronze', 'Gold', 'Silver']
})
# Find high-value Gold members for premium promotion
premium_targets = customers.loc[(customers['membership'] == 'Gold') &
(customers['purchases'] > 5)]
print(premium_targets)
This snippet demonstrates how to chain multiple conditions to filter a DataFrame. The logic inside .loc[] builds a boolean mask to identify specific rows.
- The first condition,
customers['membership'] == 'Gold', finds all Gold members. - The second,
customers['purchases'] > 5, identifies customers with more than five purchases.
The & operator ensures only rows satisfying both criteria are returned. Notice each condition is wrapped in parentheses. This is required due to operator precedence in Python.
Get started with Replit
Turn your new indexing skills into a real tool. Tell Replit Agent: "Build a dashboard to filter sales data by date" or "Create a tool to find high-value customers based on purchase history."
Replit Agent writes the code, tests for errors, and deploys your app. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

.png)
.png)
.png)