How to read a CSV file in Python
Learn how to read CSV files in Python. Explore different methods, tips, real-world applications, and common error debugging.

Python makes it simple to read CSV files, a frequent task for data analysis. The built-in csv module helps you handle structured data from many sources with ease.
In this article, you’ll explore techniques to read CSVs, along with practical tips. You'll find real-world applications and get advice to debug common issues you might face.
Reading CSV files with the csv module
import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)--OUTPUT--['Name', 'Age', 'City']
['John', '28', 'New York']
['Mary', '24', 'Boston']
The with open() statement is a Python best practice that manages file resources, automatically closing your file once the block is exited. Inside, csv.reader() creates a reader object that knows how to interpret the CSV format. This object is an iterator, making it efficient to process one row at a time.
As you loop through the csv_reader, it yields each row as a list of strings. The module handles the parsing for you, splitting the line at each comma. This saves you from manually splitting strings and dealing with edge cases like quoted fields.
Basic CSV handling techniques
Building on the csv.reader(), you can tackle more complex files by using pandas, handling custom delimiters, or accessing columns by name with csv.DictReader.
Using pandas to read CSV files
import pandas as pd
df = pd.read_csv('data.csv')
print(df.head())--OUTPUT--Name Age City
0 John 28 New York
1 Mary 24 Boston
For more advanced data work, the pandas library is often the go-to tool. Its pd.read_csv() function loads the entire file into a DataFrame. Think of a DataFrame as a smart spreadsheet inside Python, with rows and columns you can easily work with. This demonstrates why AI coding with Python is so effective for data tasks.
- The function returns a DataFrame, which is stored in the
dfvariable. - You can then use
df.head()to preview the first few rows—a great way to verify that your data has been imported correctly.
Reading CSV with different delimiters
import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file, delimiter=';')
for row in csv_reader:
print(row)--OUTPUT--['Name', 'Age', 'City']
['John', '28', 'New York']
['Mary', '24', 'Boston']
While "CSV" stands for "comma-separated values," not all files stick to that rule. You might encounter data separated by semicolons, tabs, or pipes. The csv.reader() function is flexible enough to handle this.
- To specify a different separator, you just need to use the
delimiterparameter. - In the example,
delimiter=';'tells the reader to split each row on semicolons instead of the default comma, ensuring your data is parsed correctly.
Using csv.DictReader for column access
import csv
with open('data.csv', 'r') as file:
csv_reader = csv.DictReader(file)
for row in csv_reader:
print(f"Name: {row['Name']}, City: {row['City']}")--OUTPUT--Name: John, City: New York
Name: Mary, City: Boston
When you need to work with columns by name, csv.DictReader is a more intuitive choice than csv.reader. It reads each row as a dictionary, automatically using the header row from your CSV file as the keys. This means you can access data using meaningful names instead of numerical indices. Once you've processed your data, you might also need techniques for appending to CSV files.
- You can pull specific data like
row['Name'], which makes your code clearer and easier to maintain. - It’s especially useful in larger datasets where remembering column order is impractical.
Advanced CSV processing
With the basics down, you're ready to tackle more advanced, real-world scenarios like reading specific columns, handling large files, and cleaning up missing data.
Reading specific columns from CSV
import csv
with open('data.csv', 'r') as file:
csv_reader = csv.reader(file)
header = next(csv_reader)
name_index = header.index('Name')
for row in csv_reader:
print(f"Name: {row[name_index]}")--OUTPUT--Name: John
Name: Mary
When you don't need the entire dataset, you can selectively pull data from specific columns. This approach is more efficient than loading everything, especially with large files.
- The
next(csv_reader)function isolates the header row before the loop begins. - Then,
header.index('Name')finds the numerical position of the 'Name' column. - The loop uses this index to access only the 'Name' data in each subsequent row, making your code adaptable even if column orders change.
Reading large CSV files efficiently
def read_in_chunks(file_path, chunk_size=1000):
with open(file_path, 'r') as file:
reader = csv.reader(file)
header = next(reader)
chunk = []
for i, row in enumerate(reader):
if i % chunk_size == 0 and i > 0:
yield chunk
chunk = []
chunk.append(row)
yield chunk
for chunk in read_in_chunks('large_data.csv'):
print(f"Processing {len(chunk)} rows...")--OUTPUT--Processing 1000 rows...
Processing 1000 rows...
Processing 578 rows...
Loading a massive CSV file at once can exhaust your system's memory. This code provides a memory-efficient solution by processing the file in smaller batches. The read_in_chunks function is a generator that reads the data piece by piece instead of all at once.
- The
yieldkeyword is key; it returns a chunk of rows and pauses, so the function doesn't have to store the entire file. - When you ask for the next chunk, the function resumes, reads another batch defined by
chunk_size, and yields it. This keeps memory usage consistently low. For more comprehensive strategies, explore additional techniques for handling large datasets.
Handling missing values in CSV files
import pandas as pd
import numpy as np
df = pd.read_csv('data_with_missing.csv')
df.fillna({'Name': 'Unknown', 'Age': 0, 'City': 'Not specified'}, inplace=True)
print(df.head())--OUTPUT--Name Age City
0 John 28 New York
1 Mary 24 Boston
2 Unknown 35 Not specified
It's common to find missing values in real-world datasets. The pandas library provides a straightforward way to clean this up using the fillna() method. This function replaces empty cells with default values you specify, ensuring your data is complete before analysis.
- You can pass a dictionary to
fillna()to set different replacements for each column, like 'Unknown' for a missingName. - The
inplace=Trueargument modifies your DataFrame directly, saving you from having to reassign it to a new variable.
Move faster with Replit
Replit is an AI-powered development platform where all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of just learning individual techniques, you can use Agent 4 to build complete applications from a simple description. It handles the code, databases, APIs, and deployment for you.
Instead of piecing together functions, you can describe the app you want to build, and Agent will take it from an idea to a working product. For example, you could build:
- A data cleanup utility that reads CSVs with missing information, fills in the blanks with default values, and prepares the data for analysis.
- A custom report generator that pulls specific columns, like 'Name' and 'City', from a large dataset and formats them for display.
- A log processing tool that can handle files with custom delimiters, like semicolons or tabs, and process them in manageable chunks to avoid memory errors.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Even with Python's powerful tools, you might hit a few snags when reading CSV files, but they're usually simple to fix.
Fixing UnicodeDecodeError when reading CSV files with special characters
A UnicodeDecodeError typically pops up when your file contains special characters (like accents or symbols) and isn't saved with the default encoding Python expects. To fix this, you just need to tell Python which encoding to use by adding the encoding parameter to your open() function. Using encoding='utf-8' is a common and effective solution, as UTF-8 supports a vast range of characters.
Converting string values to numbers in CSV data
The csv module reads all data as strings, which is fine until you need to perform mathematical operations. You can't add the string '28' to another number without an error. You'll need to explicitly convert these string values into numerical types using functions like int() for whole numbers or float() for decimals before you can use them in calculations.
Handling quoted text in CSV files with the quoting parameter
Sometimes, data fields in a CSV file contain the delimiter character itself, like a comma in a city name (e.g., "New York, NY"). To prevent confusion, these fields are often enclosed in quotes. The quoting parameter in csv.reader() gives you control over how to interpret these quotes.
csv.QUOTE_MINIMAL: This is the default. It only expects quotes around fields that contain the delimiter or other special characters.csv.QUOTE_ALL: This tells the reader that every single field in the file is wrapped in quotes.csv.QUOTE_NONNUMERIC: This instructs the reader to treat all non-numeric fields as quoted text.
Fixing UnicodeDecodeError when reading CSV files with special characters
A UnicodeDecodeError pops up when your CSV file has special characters, like accents or symbols, that Python's default decoder can't handle. This mismatch stops your script cold. The following code triggers this exact error by reading an international dataset.
import csv
with open('international_data.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)
By default, the open() function uses a system-specific encoding. When it encounters special characters in international_data.csv that it doesn't recognize, the script fails. The fix involves one simple addition to the code below.
import csv
with open('international_data.csv', 'r', encoding='utf-8') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)
The solution is to specify the file's encoding. By adding encoding='utf-8' to the open() function, you're telling Python how to correctly interpret special characters, preventing the script from failing.
It's a good habit to use this parameter whenever you're working with data that might contain non-English text, accents, or symbols from various sources.
Converting string values to numbers in CSV data
The csv module reads all data as strings, which means you can't perform math on numerical values directly. Trying to add a column of prices, for example, will fail because you're actually trying to add strings. The code below demonstrates this common TypeError.
import csv
with open('prices.csv', 'r') as file:
csv_reader = csv.reader(file)
next(csv_reader) # Skip header
total = 0
for row in csv_reader:
total += row[1] # Trying to add price directly
print(f"Total: {total}")
The script fails because the += operator attempts to add a string from row[1] to the numerical total. Python doesn't allow this mix of types. The corrected code below shows how to fix it.
import csv
with open('prices.csv', 'r') as file:
csv_reader = csv.reader(file)
next(csv_reader) # Skip header
total = 0
for row in csv_reader:
total += float(row[1]) # Convert string to float before adding
print(f"Total: {total}")
The solution is to explicitly convert the string value to a number before adding it to your total. By wrapping row[1] in the float() function, you’re telling Python to treat the text as a decimal number. This simple step resolves the TypeError and allows mathematical operations to proceed correctly. Remember to do this whenever you pull numerical data from a CSV that you plan to use in calculations.
Handling quoted text in CSV files with the quoting parameter
CSV files often use quotes to wrap fields that contain the delimiter, like a comma inside a city name. If the csv.reader isn't told how to handle these quotes, it can split your data incorrectly. The following code demonstrates this common parsing issue.
import csv
with open('data_with_quotes.csv', 'r') as file:
csv_reader = csv.reader(file)
for row in csv_reader:
print(row)
The default csv.reader sees the comma inside a quoted field as a delimiter, creating extra columns and misaligning your data. The corrected code below shows how to properly guide the parser.
import csv
with open('data_with_quotes.csv', 'r') as file:
csv_reader = csv.reader(file, quoting=csv.QUOTE_NONNUMERIC)
for row in csv_reader:
print(row)
The solution is to guide the parser with the quoting parameter. By setting quoting=csv.QUOTE_NONNUMERIC, you tell the csv.reader to treat all non-numeric fields as if they are wrapped in quotes. This prevents it from incorrectly splitting data on commas that are part of the text, not delimiters. It’s a simple fix that ensures your columns stay aligned and your data remains intact, especially when your CSV contains text fields with commas.
Real-world applications
Now that you've learned to navigate common errors, you can use these skills for practical tasks like calculating sales and merging datasets.
Calculating sales statistics from csv data
By combining csv.DictReader with numeric conversion, you can quickly process a sales file to calculate key metrics like total revenue and average sale value.
import csv
total_sales = 0
count = 0
with open('sales.csv', 'r') as file:
reader = csv.DictReader(file)
for row in reader:
total_sales += float(row['Amount'])
count += 1
print(f"Total sales: ${total_sales:.2f}")
print(f"Average sale: ${total_sales/count:.2f}")
This script processes a sales file to calculate key metrics. It iterates through each row using csv.DictReader, which conveniently turns every row into a dictionary. This lets you access data with readable keys like row['Amount'] instead of using indexes.
- The script converts the 'Amount' from a string to a number with
float(), a necessary step for any calculations. - It also keeps a running
countof rows to figure out the average sale value.
Merging data from multiple csv sources
You can merge data from multiple CSV files to enrich one dataset with information from another, like adding customer names to an orders report.
import csv
# Load customer data dictionary
customers = {}
with open('customers.csv', 'r') as file:
for row in csv.DictReader(file):
customers[row['id']] = row['name']
# Create enriched order report
with open('orders.csv', 'r') as in_file, open('report.csv', 'w', newline='') as out_file:
reader = csv.DictReader(in_file)
writer = csv.writer(out_file)
writer.writerow(['order_id', 'customer', 'amount'])
for order in reader:
customer = customers.get(order['customer_id'], 'Unknown')
writer.writerow([order['id'], customer, order['amount']])
print("Generated report with customer information")
This script efficiently joins data from two files. It first loads customers.csv into a dictionary, creating a fast lookup table that maps each customer’s ID to their name.
- The script then opens
orders.csvand a newreport.csvfor writing. - For each order, it uses
.get()to find the customer's name from the dictionary, defaulting to 'Unknown' if the ID is missing. - Finally, it writes the combined order details into
report.csv, creating a new file with the enriched information. For more comprehensive approaches to merging CSV files, explore additional techniques and strategies.
Get started with Replit
Turn your knowledge into a tool. Give Replit Agent a prompt like "build a sales dashboard from a CSV" or "create a script to clean and merge two datasets."
It will write the code, test for errors, and deploy your application for you. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.



