How to read a CSV file in Python

Learn how to read CSV files in Python with our guide. Discover different methods, tips, real-world applications, and how to debug common errors.

How to read a CSV file in Python
Published on: 
Thu
Feb 5, 2026
Updated on: 
Tue
Feb 24, 2026
The Replit Team Logo Image
The Replit Team

Python developers often need to read CSV files for data manipulation. The language offers powerful tools, like the csv module, that simplify how you parse and access tabular data.

You'll learn several techniques to handle CSV data. The article covers practical tips, real world applications, and common debugging advice to help you master this essential skill.

Reading CSV files with the csv module

import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)--OUTPUT--['Name', 'Age', 'City']
['John', '28', 'New York']
['Mary', '24', 'Boston']

The with statement is a Python best practice that ensures the file is closed automatically once you're done. Inside it, the csv.reader() function creates a reader object, which is an iterator that processes the file one row at a time, making it memory-efficient for large files.

  • When you iterate through the reader, each row is a list of strings.
  • The module doesn't infer data types, so you'll need to manually convert values like numbers or dates.

Basic CSV handling techniques

While csv.reader() is a great start, you can handle complex files by parsing custom delimiters or accessing columns by name with more advanced tools.

Using pandas to read CSV files

import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())--OUTPUT--Name Age City
0 John 28 New York
1 Mary 24 Boston

For more complex data tasks, the pandas library is a go-to tool. The pd.read_csv() function reads your entire file into a DataFrame—a powerful, table-like structure. Unlike the standard csv module, pandas automatically recognizes the first row as column headers, making data access more intuitive.

  • The resulting df object is a DataFrame, which gives you advanced filtering and analysis capabilities.
  • Using df.head() lets you quickly preview the first few rows to confirm everything loaded correctly.

Reading CSV with different delimiters

import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file, delimiter=';')
for row in csv_reader:
print(row)--OUTPUT--['Name', 'Age', 'City']
['John', '28', 'New York']
['Mary', '24', 'Boston']

Not all CSV files use commas to separate values. You'll often find data separated by semicolons, tabs, or other characters. The csv module handles this with a simple adjustment to the csv.reader() function.

  • By passing the delimiter argument, you can specify the exact character to split on. For example, delimiter=';' correctly parses a semicolon-separated file, making the function flexible for various formats.

Using csv.DictReader for column access

import csv
with open('data.csv', 'r') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
print(f"Name: {row['Name']}, City: {row['City']}")--OUTPUT--Name: John, City: New York
Name: Mary, City: Boston

When you need to access data by column name, csv.DictReader is the perfect tool. It treats each row as a dictionary, where the keys are automatically pulled from the header row of your file. This makes your code much more readable and robust.

  • Instead of a list, each row is an ordered dictionary, letting you access data with clear keys like row['Name'].
  • Your code is no longer tied to column order, so it won't break if someone rearranges the CSV file.

Advanced CSV processing

Moving past basic file reading, you'll learn to tackle common issues like working with specific columns, processing large files efficiently, and handling missing data.

Reading specific columns from CSV

import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file)
header = next(csv_reader)
name_index = header.index('Name')
for row in csv_reader:
print(f"Name: {row[name_index]}")--OUTPUT--Name: John
Name: Mary

When you only need data from certain columns, you can dynamically find their positions. First, use next(csv_reader) to read the header row separately and advance the reader. This ensures your loop only processes the actual data rows, not the column titles.

  • The header.index('Name') method finds the numerical position of the 'Name' column.
  • You can then use this index to access the correct value in each subsequent row, making your code adaptable even if column orders change.

Reading large CSV files efficiently

def read_in_chunks(file_path, chunk_size=1000):
with open(file_path, 'r') as file:
reader = csv.reader(file)
header = next(reader)
chunk = []
for i, row in enumerate(reader):
if i % chunk_size == 0 and i > 0:
yield chunk
chunk = []
chunk.append(row)
yield chunk

for chunk in read_in_chunks('large_data.csv'):
print(f"Processing {len(chunk)} rows...")--OUTPUT--Processing 1000 rows...
Processing 1000 rows...
Processing 578 rows...

When a CSV file is too large to fit in memory, you can't just read it all at once. This function solves that by processing the file in smaller batches, or "chunks." It’s a generator function, meaning it uses the yield keyword to produce data on the fly without storing the whole file. This approach is incredibly memory-efficient.

  • The yield keyword sends back the current chunk of rows and pauses execution.
  • The function then resumes to build the next chunk when you ask for it.
  • The modulo operator (%) is used to check when a chunk has reached its specified size.

Handling missing values in CSV files

import pandas as pd
import numpy as np

df = pd.read_csv('data_with_missing.csv')
df.fillna({'Name': 'Unknown', 'Age': 0, 'City': 'Not specified'}, inplace=True)
print(df.head())--OUTPUT--Name Age City
0 John 28 New York
1 Mary 24 Boston
2 Unknown 35 Not specified

Real-world data often has gaps, but the pandas library makes it easy to handle these missing values. After loading your data, you can use the fillna() method to replace empty cells with sensible defaults. This ensures your dataset is complete and ready for analysis.

  • You can pass a dictionary to fillna() to set different replacement values for each column, like filling missing ages with 0.
  • The inplace=True argument modifies your DataFrame directly, so you don't have to reassign it.

Move faster with Replit

Replit is an AI-powered development platform that transforms natural language into working applications. Describe what you want to build, and Replit Agent creates it—complete with databases, APIs, and deployment.

For the CSV handling techniques we've explored, Replit Agent can turn them into production-ready tools:

  • Build a data cleaning utility that reads a CSV, fills missing values based on your rules, and exports a clean file.
  • Create a simple dashboard that parses a CSV with product inventory and displays it as a searchable web page.
  • Deploy a log analyzer that processes large CSV files in chunks to summarize user activity or system errors.

Describe your app idea, and Replit Agent will write the code, test it, and fix issues automatically, all in your browser.

Common errors and challenges

Even with the right tools, you might run into issues like encoding errors, data type mismatches, or tricky formatting.

  • Fixing UnicodeDecodeError when reading CSV files with special characters
  • Converting string values to numbers in CSV data
  • Handling quoted text in CSV files with the quoting parameter

Fixing UnicodeDecodeError when reading CSV files with special characters

You'll often hit a UnicodeDecodeError when a CSV file includes special characters or non-English text. This error means Python is trying to read the file using an encoding—like the default UTF-8—that doesn't match how the file was saved.

Attempting to open it without specifying the correct encoding will cause the program to fail, as shown in the following example.

import csv
with open('international_data.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)

By default, the open() function reads the file assuming a standard encoding. Since the code doesn't specify one, it fails on special characters. See how a small change to the function call solves this.

import csv
with open('international_data.csv', 'r', encoding='utf-8') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)

The fix is simple: you just need to tell Python which encoding to use. By adding the encoding='utf-8' argument to the open() function, you explicitly define how the file should be read. This prevents the UnicodeDecodeError that occurs when the file contains special characters or non-English text.

  • This ensures Python correctly interprets every character.
  • It's a crucial step when your data comes from diverse sources or includes international symbols.

Converting string values to numbers in CSV data

The csv module reads all data as strings, which means you can't perform mathematical calculations on numerical values directly. Attempting to add a string-based price to a total will raise a TypeError because Python doesn't automatically convert the data types.

See this common error in action in the code below.

import csv
with open('prices.csv', 'r') as file:
csv_reader = csv.reader(file)
next(csv_reader) # Skip header
total = 0
for row in csv_reader:
total += row[1] # Trying to add price directly
print(f"Total: {total}")

The += operator fails because it's trying to add the string row[1] to the integer total. Python requires you to explicitly convert the string to a number first. See how a simple change fixes this.

import csv
with open('prices.csv', 'r') as file:
csv_reader = csv.reader(file)
next(csv_reader) # Skip header
total = 0
for row in csv_reader:
total += float(row[1]) # Convert string to float before adding
print(f"Total: {total}")

To fix the TypeError, you must manually convert the string value to a number before any calculations. The float() function transforms the text from row[1] into a number that can be used in mathematical operations.

  • This allows the += operator to correctly add the price to your total.
  • Remember to convert data types whenever you perform calculations on values read from a text file, as they are almost always imported as strings.

Handling quoted text in CSV files with the quoting parameter

Sometimes, CSV fields contain commas within quoted text, like "New York, NY". The default csv.reader can get confused and split the field incorrectly, breaking your data structure by creating extra, unwanted columns. The following code shows this problem in action.

import csv
with open('data_with_quotes.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)

The default csv.reader sees the comma inside the quoted city name as a separator, incorrectly splitting one column into two. The following code shows how to instruct the reader to handle these quoted fields correctly.

import csv
with open('data_with_quotes.csv', 'r') as file:
csv_reader = csv.reader(file, quoting=csv.QUOTE_NONNUMERIC)
for row in csv_reader:
print(row)

The csv.reader() function includes a quoting parameter to manage this. By setting it to csv.QUOTE_NONNUMERIC, you tell the reader to treat any field enclosed in quotes as a single text string, even if it contains a comma. This prevents it from incorrectly splitting fields like "New York, NY".

  • This setting is essential when text fields might contain your delimiter character.
  • It ensures your data’s structure remains intact during parsing.

Real-world applications

Now that you've mastered the fundamentals, you can use these skills for practical tasks like calculating sales or merging datasets.

Calculating sales statistics from csv data

A common real-world task is calculating sales statistics, which you can do by reading your data with csv.DictReader and converting sales figures into numbers for analysis.

import csv

total_sales = 0
count = 0
with open('sales.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
total_sales += float(row['Amount'])
count += 1

print(f"Total sales: ${total_sales:.2f}")
print(f"Average sale: ${total_sales/count:.2f}")

This script tallies sales figures from a CSV file. By using csv.DictReader, each row becomes a dictionary, so you can reliably grab sales data with row['Amount'] regardless of its column position. This approach makes your logic clearer and more robust.

  • The code iterates through each sale, converting the string value from the 'Amount' column into a number with float().
  • It also increments a count variable to track the total number of entries, which is essential for calculating the average sale later.

Merging data from multiple csv sources

You can enrich your data by merging information from multiple CSV files, such as combining customer details with their corresponding order histories.

import csv

# Load customer data dictionary
customers = {}
with open('customers.csv', 'r') as file:
for row in csv.DictReader(file):
customers[row['id']] = row['name']

# Create enriched order report
with open('orders.csv', 'r') as in_file, open('report.csv', 'w', newline='') as out_file:
reader = csv.DictReader(in_file)
writer = csv.writer(out_file)

writer.writerow(['order_id', 'customer', 'amount'])
for order in reader:
customer = customers.get(order['customer_id'], 'Unknown')
writer.writerow([order['id'], customer, order['amount']])

print("Generated report with customer information")

This script first builds a lookup table by loading customers.csv into a dictionary, which maps each customer ID to a name. This makes retrieving customer information fast and efficient.

  • It then reads orders.csv and writes to a new report.csv file simultaneously.
  • For each order, it uses the get() method to safely find the customer's name using the customer_id, defaulting to 'Unknown' if no match is found.
  • Finally, it writes an enriched row with the order details and customer name to the new report.

Get started with Replit

Turn your new skills into a real tool. Describe what you want to build for Replit Agent, like “a utility that merges two CSVs into a single report” or “a dashboard that visualizes sales data from an uploaded file.”

Replit Agent writes the code, tests for errors, and handles deployment for you. It turns your ideas into fully functional applications. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.