How to iterate through a dataframe in Python

Explore various ways to iterate through a Python DataFrame. Get tips, see real-world examples, and learn to debug common iteration errors.

Published on:

Tue

Feb 24, 2026

Updated on:

Mon

Apr 6, 2026

The Replit Team

ON THIS PAGE

Example H2

Iteration through a pandas DataFrame is a fundamental skill for data analysis in Python. It lets you access and process data row by row, a crucial step for many tasks.

In this article, we'll explore various iteration techniques, from basic loops to optimized methods like itertuples(). You'll get performance tips, see real-world applications, and receive practical advice to debug your code effectively.

Basic iteration with `iterrows()`

import pandas as pd df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}) for index, row in df.iterrows(): print(f"Index: {index}, Name: {row['Name']}, Age: {row['Age']}")--OUTPUT--Index: 0, Name: Alice, Age: 25 Index: 1, Name: Bob, Age: 30 Index: 2, Name: Charlie, Age: 35

The iterrows() method is a generator that iterates over DataFrame rows, returning each as a tuple containing the index and the row's data. This is why the for loop unpacks two variables, index and row, for each step of the iteration.

The index is the row's label.
The row is a pandas Series object containing the actual data, which lets you access columns by name, like row['Name'].

While intuitive for simple loops, it's worth noting that iterrows() can be inefficient on large datasets because it creates a new Series object for every single row.

Common iteration methods

Beyond iterrows(), pandas offers more efficient methods like itertuples() for rows, items() for columns, and apply() for complex row-wise operations.

Using `itertuples()` for better performance

import pandas as pd df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Age': [25, 30, 35]}) for row in df.itertuples(): print(f"Index: {row.Index}, Name: {row.Name}, Age: {row.Age}")--OUTPUT--Index: 0, Name: Alice, Age: 25 Index: 1, Name: Bob, Age: 30 Index: 2, Name: Charlie, Age: 35

The itertuples() method is a faster alternative to iterrows(). Instead of creating a new Series object for each row, it yields a lightweight namedtuple. This makes it much more memory-efficient, especially for large datasets.

You access data using dot notation, like row.Name, which is often cleaner and faster than dictionary-style lookups.
The row's index is conveniently available as row.Index.
Because it avoids the overhead of creating Series objects, itertuples() generally offers significantly better performance.

Iterating through columns with `items()`

import pandas as pd df = pd.DataFrame({'Name': ['Alice', 'Bob'], 'Age': [25, 30]}) for column_name, column_data in df.items(): print(f"Column: {column_name}") print(column_data.values)--OUTPUT--Column: Name ['Alice' 'Bob'] Column: Age [25 30]

When you need to work with columns instead of rows, the items() method is your go-to. It iterates through the DataFrame one column at a time, yielding a tuple for each.

The first item in the tuple is the column_name, which is a string.
The second is the column_data, a pandas Series containing all values for that column.

This approach is perfect for tasks where you need to process or analyze entire columns sequentially, rather than individual rows.

Using `apply()` function for row-wise operations

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) result = df.apply(lambda row: row['A'] + row['B'], axis=1) print(result)--OUTPUT--0 5 1 7 2 9 dtype: int64

The apply() function is a powerful tool for executing custom logic on every row or column. When you set axis=1, you're telling pandas to perform an operation row by row. It's especially useful for complex tasks that can't be handled by standard vectorized methods.

The function you provide—in this case, a lambda—is called for each row.
Each row is passed as a pandas Series, so you can access its data by column name, like row['A'].

Advanced iteration techniques

While direct iteration has its place, you can often gain significant performance by avoiding loops in favor of vectorized operations or targeted methods like .loc and groupby().

Vectorized operations instead of iteration

import pandas as pd df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}) # Instead of iterating to multiply columns df['C'] = df['A'] * df['B'] print(df)--OUTPUT--A B C 0 1 4 4 1 2 5 10 2 3 6 18

Vectorization is a powerful pandas feature that lets you apply operations across entire columns simultaneously. Instead of looping through each row, you can work with whole datasets at once. This approach is not only cleaner but also significantly faster because it leverages optimized, low-level code.

The expression df['A'] * df['B'] multiplies the two columns element-wise in a single, efficient step.
This method avoids the overhead of Python loops, making your code more concise and much easier to read.

Using `.loc` for conditional updates

import pandas as pd df = pd.DataFrame({'Name': ['Alice', 'Bob', 'Charlie'], 'Score': [85, 65, 90]}) df.loc[df['Score'] >= 80, 'Grade'] = 'A' df.loc[df['Score'] < 80, 'Grade'] = 'B' print(df)--OUTPUT--Name Score Grade 0 Alice 85 A 1 Bob 65 B 2 Charlie 90 A

When you need to modify data based on a condition, .loc is your best friend. It lets you skip slow loops by selecting and updating rows in a single, optimized step. This method is far more efficient than iterating row by row with an if statement, especially on larger datasets.

The first argument, like df['Score'] >= 80, acts as a filter, identifying exactly which rows to modify.
The second argument, 'Grade', specifies the column you want to update, and the new value is assigned directly to the filtered selection.

Efficient batch processing with `groupby()`

import pandas as pd df = pd.DataFrame({ 'Category': ['A', 'A', 'B', 'B', 'B'], 'Value': [10, 20, 15, 25, 30] }) group_sums = df.groupby('Category')['Value'].sum() print(group_sums)--OUTPUT--Category A 30 B 70 Name: Value, dtype: int64

The groupby() method is a cornerstone of the "split-apply-combine" strategy. Instead of iterating, you can split your data into groups based on a column's values and then apply a function to each group simultaneously. This is incredibly efficient for batch processing tasks.

First, df.groupby('Category') partitions the DataFrame into separate groups for 'A' and 'B'.
Next, an aggregation function like .sum() is applied to the Value column within each of those groups.
Finally, pandas combines these results into a new, concise Series.

Move faster with Replit

Knowing how to use pandas is one thing, but building a complete application is another. Replit is an AI-powered development platform designed to bridge that gap. It comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. From there, Agent 4 can take your idea and build a working product by handling the code, databases, and deployment from a simple description.

Instead of just piecing together functions, you can build a finished tool. For example, you could create:

A dashboard that automatically processes a CSV of sales data, using groupby() to calculate and display total revenue per product category.
A grading utility that takes a list of student scores and uses conditional logic with .loc to assign a pass or fail status in a new column.
A data migration script that iterates through user profiles with itertuples() to reformat names and addresses into a new, standardized structure.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the right methods, iterating through DataFrames can lead to unexpected errors and performance issues if you're not careful.

Modifying DataFrames during iteration with `iterrows()`

A common mistake is trying to modify a DataFrame while iterating with iterrows(). It's a trap because iterrows() returns a copy of each row, not a direct view into the DataFrame. Any changes you make to the row variable inside the loop won't actually update the original data, leading to bugs that can be tricky to spot.

If you need to update values, it's much safer and more effective to use accessor methods like .loc or .at to target and change the data directly.

Performance pitfalls when using `iterrows()` for calculations

While iterrows() is easy to understand, it can become a major performance bottleneck. The method creates a new pandas Series object for every single row, and this overhead quickly adds up on large datasets. Your code might run fine on a small sample, but it won't scale efficiently.

For calculations, vectorized operations are almost always the superior choice. They process entire columns at once, leveraging highly optimized C code under the hood instead of a slow Python loop.

Handling `NaN` values while iterating

Missing data, represented by NaN (Not a Number) values, can also cause headaches during iteration. Standard comparisons won't work as expected—for example, a check like row['value'] == NaN will always evaluate to false. This can break your conditional logic in subtle ways.

To properly handle these cases, use pandas' dedicated functions like pd.isna() to detect missing values. An even better practice is to clean your data beforehand with methods like fillna() or dropna(), which often removes the need to check for NaN inside a loop at all. For more complex data cleaning workflows, AI coding with Python can help automate the detection and handling of various data quality issues.

Modifying DataFrames during iteration with `iterrows()`

Attempting to update a DataFrame inside an iterrows() loop is a classic mistake that often leads to silent failures. Your code runs without any errors, but the original data remains unchanged—a frustrating bug to track down. See this in action below.

import pandas as pd df = pd.DataFrame({'Value': [1, 2, 3, 4, 5]}) # Trying to double each value by modifying during iteration for index, row in df.iterrows(): row['Value'] = row['Value'] * 2 print(df) # The DataFrame remains unchanged

The assignment within the loop, row['Value'] = ..., only updates a temporary copy, which is why the original DataFrame is unchanged. See the correct way to apply this change below.

import pandas as pd df = pd.DataFrame({'Value': [1, 2, 3, 4, 5]}) # Correctly modify values using df.at for index, row in df.iterrows(): df.at[index, 'Value'] = row['Value'] * 2 print(df) # Values are now doubled

The correct approach uses df.at[index, 'Value'] to modify the DataFrame directly. This method targets a specific cell by its index and column name, ensuring the update is saved to the original data. The key is to remember that the row variable in an iterrows() loop is just a temporary copy. Any changes made to it won't persist unless you use an accessor like .at or .loc on the actual DataFrame.

Performance pitfalls when using `iterrows()` for calculations

While iterrows() is straightforward, it's a performance trap for calculations. The overhead of creating a Series for each row is negligible on small datasets but becomes a significant bottleneck as data grows, making your code slow and inefficient at scale.

The code below demonstrates this inefficiency by performing a simple calculation on a moderately sized DataFrame. Notice how a Python loop handles what a single vectorized operation could do instantly.

import pandas as pd df = pd.DataFrame({'A': range(1000), 'B': range(1000)}) result = 0 for index, row in df.iterrows(): result += row['A'] * row['B'] print(f"Sum of products: {result}")

Using iterrows() here forces a slow, row-by-row calculation in Python. This approach misses the chance to use pandas' own highly optimized functions for a massive speed boost. See the better way to do it below.

import pandas as pd df = pd.DataFrame({'A': range(1000), 'B': range(1000)}) # Vectorized calculation is much faster result = (df['A'] * df['B']).sum() print(f"Sum of products: {result}")

The vectorized approach is far more efficient. Instead of a slow Python loop, it performs the calculation across entire columns at once. The expression (df['A'] * df['B']) multiplies the columns element-wise, and then .sum() adds up the results in a single, optimized step. This method leverages pandas' fast backend code, so you'll see a massive performance gain on larger datasets. Always favor vectorization for mathematical operations.

Handling `NaN` values while iterating

Missing data, represented as NaN, can trip you up during iteration. Since any mathematical operation involving NaN results in NaN, your calculations can get skewed without raising an explicit error. The code below demonstrates this problem in action.

import pandas as pd import numpy as np df = pd.DataFrame({'Value': [1, np.nan, 3]}) for index, row in df.iterrows(): # This will cause errors with math operations on NaN result = row['Value'] * 2 print(f"Index {index}: {result}")

The multiplication row['Value'] * 2 doesn't raise an error when it encounters the NaN value. It just returns another NaN, silently corrupting your results without any warning. The code below shows the proper way to handle this.

import pandas as pd import numpy as np df = pd.DataFrame({'Value': [1, np.nan, 3]}) for index, row in df.iterrows(): # Check for NaN before operations if pd.notna(row['Value']): result = row['Value'] * 2 print(f"Index {index}: {result}") else: print(f"Index {index}: Missing value")

The solution is to check for missing data before you do anything with it. The pd.notna() function is perfect for this. By wrapping your calculation in an if statement with pd.notna(row['Value']), you ensure you're only working with actual numbers.

This prevents NaN from silently breaking your logic. It's a simple but essential safeguard whenever your dataset might have empty cells.

Real-world applications

With the common errors in mind, here’s how you can apply these iteration techniques to solve practical data challenges.

Filling missing values with `iterrows()`

Sometimes, filling missing data isn't as simple as replacing all NaN values with a single number. You might need to apply conditional logic, and that's where iterrows() can be useful despite its performance drawbacks.

Imagine you have a dataset of employees with some missing ages. Instead of using a generic average, you want to impute the age based on their job title. With iterrows(), you can loop through each row, check if the age is missing using pd.isna(), and then assign a specific value based on the row['JobTitle']. This row-by-row approach gives you the flexibility to implement complex, custom rules that vectorized methods can't easily handle.

Processing data for API requests with `itertuples()`

When you need to send data from a DataFrame to an external service, like an API, efficiency is key. This is a perfect use case for itertuples(), which is significantly faster and more memory-friendly than iterrows(). For rapid prototyping of such data processing scripts, vibe coding can help you quickly build and test your iteration logic.

For example, let's say you need to iterate over a DataFrame of user profiles to send a personalized welcome email to each one. Inside the loop, you're not modifying the DataFrame; you're just reading data to construct a request. Using itertuples() lets you access each user's details with clean dot notation, like row.Name and row.Email, to build a JSON payload for your email API. Because it yields lightweight namedtuples instead of heavy Series objects, your script will run much faster, especially when processing thousands of records.

Filling missing values with `iterrows()`

A common use case is iterating through a dataset to fill missing product prices, replacing each NaN value with the average price from that item's category.

import pandas as pd import numpy as np # Sample dataset with missing values df = pd.DataFrame({ 'Product': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'], 'Price': [1200, np.nan, 400, np.nan, 80], 'Category': ['Electronics', 'Electronics', 'Electronics', 'Computer Accessories', 'Computer Accessories'] }) # Calculate average prices by category category_avg = df.groupby('Category')['Price'].mean().to_dict() # Fill missing prices based on category average for index, row in df.iterrows(): if pd.isna(row['Price']): df.at[index, 'Price'] = category_avg[row['Category']] print(df)

This code demonstrates a smart way to fill missing data. First, it calculates the average price for each product category using groupby() and stores the results in a dictionary. This pre-calculation is efficient and sets up the main logic.

Then, it loops through the DataFrame with iterrows(). Inside the loop:

It checks for missing prices using pd.isna().
If a price is missing, it uses df.at to update the cell directly, pulling the correct average price from the dictionary based on the row's category.

This ensures missing values are replaced with meaningful, context-specific data.

Processing data for API requests with `itertuples()`

For example, you can use itertuples() to iterate through a customer DataFrame, call an API for each person, and then compile the results into a new, enriched dataset.

import pandas as pd import random # For simulating API responses # Customer data requiring enrichment from an API customers = pd.DataFrame({ 'customer_id': [101, 102, 103, 104], 'email': ['alice@example.com', 'bob@example.com', 'charlie@example.com', 'dave@example.com'] }) # Process each customer with a simulated API call results = [] for customer in customers.itertuples(): # Simulate API call to get customer risk score risk_score = random.randint(1, 100) results.append({'customer_id': customer.customer_id, 'email': customer.email, 'risk_score': risk_score}) enriched_df = pd.DataFrame(results) print(enriched_df)

This script demonstrates how to build a new dataset by processing an existing one. It uses the efficient itertuples() method to loop through each row of the initial customers DataFrame.

Inside the loop, it simulates fetching external data by generating a random risk_score.
It then creates a dictionary containing the original customer details and this new score.
Each dictionary is appended to a results list.

Once the loop is complete, this list of dictionaries is used to construct an entirely new DataFrame, which now holds the combined, enriched information. This process demonstrates converting list of dictionary to DataFrame, a common pattern in data processing workflows.

Get started with Replit

Put your new skills to work on a real project. Tell Replit Agent to “build a tool that cleans a CSV by filling missing values” or “create a dashboard that summarizes sales data by category.”

It will generate the code, handle testing, and deploy your application from that simple description. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Follow @Replit