How to index a DataFrame in Python

Learn to index a pandas DataFrame in Python. This guide covers methods, tips, real-world applications, and how to debug common errors.

Published on:

Wed

Mar 25, 2026

Updated on:

Thu

Mar 26, 2026

The Replit Team

ON THIS PAGE

Example H2

To index a pandas DataFrame is to select specific rows and columns for analysis. It's a fundamental skill for data manipulation in Python, allowing you to access subsets of your data efficiently.

In this article, you'll learn key indexing techniques with .loc and .iloc. We'll cover practical tips, real-world applications, and common debugging advice to help you master DataFrame selection and filtering.

Basic indexing with `.iloc` and `.loc`

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3']) # Basic row and column selection print(df.loc['row2', 'B'])--OUTPUT--5

The example creates a DataFrame with custom row labels—'row1', 'row2', and 'row3'—using the index parameter. This setup is crucial for understanding label-based indexing, which is what .loc is designed for.

The .loc indexer selects data by these explicit labels. When you use df.loc['row2', 'B'], you're telling pandas to find the value at the intersection of the row named 'row2' and the column named 'B'. This method is powerful because it's readable and isn't affected by the DataFrame's integer position, making your code more robust.

Common indexing methods

Building on that basic selection, you can access data in three main ways: with simple brackets, label-based indexing via .loc, and integer-based indexing via .iloc.

Selecting columns with bracket notation

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Select a single column column_a = df['A'] # Select multiple columns subset = df[['A', 'C']] print(subset)--OUTPUT--A C 0 1 7 1 2 8 2 3 9

Bracket notation is the most direct way to select columns. To grab a single column, you just pass its name as a string, like df['A']. This operation returns a pandas Series containing the column's data.

For multiple columns, you'll need to use double brackets. The syntax df[['A', 'C']] works because the inner brackets create a list of column names, which is then passed to the outer selection brackets.
This returns a new DataFrame with only the specified columns.

Using `.loc[]` for label-based indexing

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3']) # Select rows and columns by label subset = df.loc['row1':'row2', 'A':'B'] print(subset)--OUTPUT--A B row1 1 4 row2 2 5

The .loc indexer is also powerful for slicing data by labels. In the expression df.loc['row1':'row2', 'A':'B'], you're carving out a rectangular section of the DataFrame. The slice before the comma selects the rows, and the slice after it selects the columns.

A key difference from integer-based slicing is that .loc is inclusive. It includes both the starting label ('row1') and the ending label ('row2') in the output.

Using `.iloc[]` for integer-based indexing

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8], 'C': [9, 10, 11, 12]}) # Select rows and columns by position subset = df.iloc[0:2, 1:3] print(subset)--OUTPUT--B C 0 5 9 1 6 10

The .iloc indexer selects data based on integer positions, just like standard Python lists. It's your go-to for positional access. In the expression df.iloc[0:2, 1:3], you're slicing the DataFrame by its numerical index, not its labels.

The 0:2 part selects rows at positions 0 and 1. Following Python's convention, the slice is exclusive of the end index.
Similarly, 1:3 selects columns at positions 1 and 2, which correspond to columns 'B' and 'C' in this case.

This makes .iloc ideal when you need to work with data based on its order rather than its explicit name.

Advanced indexing techniques

Beyond basic slicing, you can tackle complex filtering with boolean indexing, manage layered data with MultiIndex, and write expressive queries with the .query() method.

Boolean indexing for filtering data

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # Filter rows where column A is greater than 2 filtered_df = df[df['A'] > 2] print(filtered_df)--OUTPUT--A B 2 3 7 3 4 8

Boolean indexing lets you filter a DataFrame based on a condition. The core of this technique is an expression that returns `True` or `False` for each row, like df['A'] > 2.

This creates a boolean Series that acts as a mask. When you pass this mask back into the DataFrame, pandas returns only the rows where the condition was `True`. It’s a highly efficient and readable way to select subsets of your data based on its values.

Working with `MultiIndex` for hierarchical data

import pandas as pd import numpy as np arrays = [['A', 'A', 'B', 'B'], ['one', 'two', 'one', 'two']] index = pd.MultiIndex.from_arrays(arrays, names=('first', 'second')) df = pd.DataFrame(np.random.randn(4, 2), index=index, columns=['data1', 'data2']) print(df.loc[('A', 'one')])--OUTPUT--data1 0.123456 data2 0.789012 Name: (A, one), dtype: float64

A MultiIndex lets you handle hierarchical data by creating multiple index levels within a single DataFrame. The code uses pd.MultiIndex.from_arrays() to build a two-level index from a list of lists, naming the levels 'first' and 'second'. This is a powerful way to organize complex datasets without creating extra columns.

To select data, you simply pass a tuple of labels to .loc. The expression df.loc[('A', 'one')] targets the row where the 'first' level is 'A' and the 'second' level is 'one', making it easy to navigate the data's layers.

Using `.query()` method for SQL-like filtering

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]}) # Use query method for filtering result = df.query('A > 1 and B < 6') print(result)--OUTPUT--A B C 1 2 5 8

The .query() method offers a more expressive, SQL-like way to filter your DataFrame. Instead of chaining boolean conditions with brackets, you pass a single string that specifies your filtering logic. This approach often makes your code cleaner and easier to read, especially when dealing with multiple criteria.

In the example, df.query('A > 1 and B < 6') evaluates the string as a condition. It selects rows where the value in column 'A' is greater than 1 and the value in column 'B' is less than 6, returning a new filtered DataFrame.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

The indexing techniques you've learned are the building blocks for powerful data tools. With Replit Agent, you can turn these concepts into production applications:

Build an interactive sales dashboard that filters customer data using expressive conditions with the .query() method.
Create a data extraction utility that pulls specific records from a large dataset using .loc for label-based lookups.
Deploy a financial analysis tool that navigates hierarchical market data with MultiIndex to generate custom reports.

Describe your app idea, and Replit Agent writes the code, tests it, and fixes issues automatically, all in your browser.

Common errors and challenges

Even experienced developers run into indexing issues; here’s how to navigate the most common ones you'll face.

Handling `KeyError` when using incorrect index types with `.loc`

A KeyError is one of the most frequent errors you'll encounter. It pops up when you try to access a row or column label with .loc that doesn't exist in the DataFrame's index.

For example, if you have row labels 'row1' and 'row2', asking for df.loc['row3'] will trigger this error because 'row3' isn't a valid key. Always double-check that your labels match the DataFrame's index exactly, including capitalization and data type.

Fixing syntax errors when selecting multiple columns

A subtle syntax mistake often trips people up when selecting more than one column. You might instinctively write df['A', 'C'], but this will result in an error.

The correct way is to use double brackets: df[['A', 'C']]. This works because the inner brackets create a Python list of the column names you want, and the outer brackets pass that list to the DataFrame for selection.

Avoiding the `SettingWithCopyWarning` with chained indexing

The SettingWithCopyWarning isn't an error, but it's a crucial message from pandas. It suggests that you might be modifying a temporary copy of your data instead of the original DataFrame, so your changes won't stick.

This often happens with "chained indexing," where you select and assign in two separate steps, like df[df['A'] > 2]['B'] = 0. Pandas can't guarantee whether this modifies the original df or a copy.

To fix this, use .loc to perform both the row selection and column assignment in a single operation. The correct, reliable syntax is df.loc[df['A'] > 2, 'B'] = 0. This ensures you're always working directly on the original DataFrame.

Handling `KeyError` when using incorrect index types with `.loc`

A KeyError doesn't just happen with missing labels. It also occurs when you give .loc an integer position instead of the label it expects. This common mix-up happens because .loc is strictly for label-based indexing. The code below shows this error in action.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3']) # This causes a KeyError because .loc expects labels, not integer positions result = df.loc[0, 'A'] print(result)

Using df.loc[0, 'A'] causes a KeyError because the index contains string labels, such as 'row1'. The integer 0 isn't a valid label in this index, so the lookup fails. See the correct syntax for positional access below.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['row1', 'row2', 'row3']) # Fixed: Use the correct label with .loc or switch to .iloc for integer positions result = df.loc['row1', 'A'] # Using correct label # Alternative: result = df.iloc[0, 0] # Using position-based indexing print(result)

The fix boils down to a simple rule: use the right tool for the job. You'll want to use df.loc['row1', 'A'] because .loc exclusively works with labels. If you need to select by integer position, you must switch to df.iloc[0, 0]. This mix-up is especially common when your DataFrame has a custom, non-numeric index. Always match your indexer—.loc or .iloc—to your selection type, whether it's a label or a position.

Fixing syntax errors when selecting multiple columns

A frequent KeyError stems from a subtle syntax mistake when selecting multiple columns. You might intuitively write df['A', 'B'], but pandas interprets this as a single tuple key, which typically fails. The code below demonstrates this common pitfall.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # This raises a KeyError - incorrect syntax for multiple column selection columns = df['A', 'B'] print(columns)

The expression df['A', 'B'] fails because pandas searches for a single column named with the tuple ('A', 'B'), not two separate columns. This mismatch triggers the KeyError. The corrected syntax is shown in the next example.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # Fixed: Use a list inside brackets to select multiple columns columns = df[['A', 'B']] print(columns)

The fix is to wrap the column names in an extra set of brackets: df[['A', 'B']]. This works because you're passing a list of strings to the DataFrame's selection operator. The inner brackets create the list, and the outer brackets perform the selection. This error often appears when you're trying to create a subset of your data for analysis. Just remember to use double brackets whenever you need more than one column.

Avoiding the `SettingWithCopyWarning` with chained indexing

This warning isn't an error, but it's a critical message from pandas. It appears when you use chained indexing—like df[...]df[...]—to modify data, creating ambiguity over whether you're changing the original DataFrame or a temporary copy. The code below shows this in action.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # This can produce a SettingWithCopyWarning and unpredictable results subset = df[df['A'] > 2] subset['B'] = 0 print(df) # May or may not be modified

The operation is split into two steps. First, df[df['A'] > 2] creates a new DataFrame. Then, you modify this new object, leaving the original df untouched. See the correct, single-step approach in the code below.

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [5, 6, 7, 8]}) # Fixed: Use .loc for clear and predictable modifications df.loc[df['A'] > 2, 'B'] = 0 print(df)

The fix is to use .loc to combine row and column selection into a single, unambiguous operation. The expression df.loc[df['A'] > 2, 'B'] = 0 tells pandas exactly which rows to filter and which column to modify all at once. This guarantees your changes apply directly to the original DataFrame, not a temporary copy. You'll want to watch for this warning whenever you filter and then try to assign a new value in a separate step.

Real-world applications

With a solid handle on debugging, you can confidently apply these indexing skills to real-world tasks like sales analysis and customer segmentation.

Analyzing monthly sales data with `.loc[]`

You can use .loc to easily slice time-series data, such as pulling out all sales records from the first quarter of the year.

import pandas as pd # Monthly sales data for 2023 dates = pd.date_range('2023-01-01', periods=6, freq='M') sales = pd.DataFrame({ 'amount': [12000, 15000, 18000, 17000, 19000, 22000] }, index=dates) # Extract Q1 sales data q1_sales = sales.loc['2023-01-31':'2023-03-31'] print(q1_sales)

This example showcases how to filter time-series data. It starts by creating a DatetimeIndex using pd.date_range(), which generates six month-end dates to serve as the DataFrame's index.

The core of the operation is the slice sales.loc['2023-01-31':'2023-03-31'].
Since the index consists of dates, you can use date strings to define the start and end points of your selection.
The .loc indexer includes both the start and end labels, so this expression effectively pulls all records within that three-month window.

Using boolean indexing and `.loc[]` for customer segmentation

You can combine multiple conditions with boolean indexing and .loc[] to perform targeted customer segmentation, like isolating high-value members for a promotion.

import pandas as pd # Customer purchase data customers = pd.DataFrame({ 'customer_id': [101, 102, 103, 104, 105], 'purchases': [5, 10, 3, 20, 7], 'membership': ['Silver', 'Gold', 'Bronze', 'Gold', 'Silver'] }) # Find high-value Gold members for premium promotion premium_targets = customers.loc[(customers['membership'] == 'Gold') & (customers['purchases'] > 5)] print(premium_targets)

This snippet demonstrates how to chain multiple conditions to filter a DataFrame. The logic inside .loc[] builds a boolean mask to identify specific rows.

The first condition, customers['membership'] == 'Gold', finds all Gold members.
The second, customers['purchases'] > 5, identifies customers with more than five purchases.

The & operator ensures only rows satisfying both criteria are returned. Notice each condition is wrapped in parentheses. This is required due to operator precedence in Python.

Get started with Replit

Turn your new indexing skills into a real tool. Tell Replit Agent: "Build a dashboard to filter sales data by date" or "Create a tool to find high-value customers based on purchase history."

Replit Agent writes the code, tests for errors, and deploys your app. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free