How to read all files in a directory in Python
Discover how you can read all files in a directory with Python. Get tips, see real-world uses, and learn to debug common errors.
.png)
The ability to read all files in a directory is a core skill for Python developers who manage data or automate workflows. Python provides powerful modules to simplify this common task.
In this article, we'll cover several techniques to list and read files from a directory. We'll also share practical tips, real-world applications, and essential debugging advice to help you confidently master directory operations.
Using os.listdir() to get all files
import os
directory = "sample_dir"
files = os.listdir(directory)
for file in files:
print(file)--OUTPUT--file1.txt
file2.csv
data.json
image.png
subdirectory
The os.listdir() function is a straightforward way to get a list of all entries within a given directory. It returns a Python list containing the names of everything inside, but it's important to note a few things:
- It includes both files (like
file1.txt) and subdirectories (likesubdirectory). - The list is unsorted and returns only the base names, not the full paths.
Because you only get the names, you'll need to join the original directory path to each item before you can read or modify it.
Common approaches for directory traversal
While os.listdir() is useful for simple listings, Python’s os.walk(), glob, and pathlib modules offer more powerful ways to navigate directories and find specific files.
Using os.walk() for recursive traversal
import os
directory = "sample_dir"
for root, dirs, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
print(file_path)--OUTPUT--sample_dir/file1.txt
sample_dir/file2.csv
sample_dir/data.json
sample_dir/image.png
sample_dir/subdirectory/nested_file.txt
The os.walk() function is your go-to for recursively exploring a directory. It generates a tuple for each directory it enters, including the top one. This makes it perfect for finding all files, even those nested in subdirectories. For each level, it provides:
root: The path to the current directory.dirs: A list of subdirectories within it.files: A list of files within it.
By combining root and a file name with os.path.join(), you get the full path needed to access each file directly.
Using glob module for pattern matching
import glob
import os
directory = "sample_dir"
file_pattern = os.path.join(directory, "*.*")
files = glob.glob(file_pattern)
for file in files:
print(file)--OUTPUT--sample_dir/file1.txt
sample_dir/file2.csv
sample_dir/data.json
sample_dir/image.png
The glob module is your best bet for finding files that match a specific pattern, using familiar Unix-style wildcards. The glob.glob() function takes a pattern and returns a list of full paths, so you don't need to join them manually.
- The
*wildcard matches any sequence of characters. In the example,"*.*"finds all files with an extension. - This approach is more direct than
os.walk()if you don't need to search subdirectories, making it ideal for simple filtering.
Using pathlib for modern path handling
from pathlib import Path
directory = Path("sample_dir")
for file_path in directory.iterdir():
if file_path.is_file():
print(file_path)--OUTPUT--sample_dir/file1.txt
sample_dir/file2.csv
sample_dir/data.json
sample_dir/image.png
The pathlib module offers a modern, object-oriented way to handle filesystem paths. Instead of working with strings, you create Path objects that have useful methods, which can make your code cleaner and more intuitive.
- The
iterdir()method iterates over all items in the directory. - Each item is a
Pathobject, so you can call methods likeis_file()directly on it to filter out subdirectories. - This approach avoids manual string joining and often leads to more readable code than older methods.
Advanced techniques and optimizations
Now that you know how to find files, you can build on those skills to read their contents safely, filter them by type, and process them efficiently.
Reading file contents with context managers
import os
directory = "sample_dir"
for filename in os.listdir(directory):
file_path = os.path.join(directory, filename)
if os.path.isfile(file_path):
with open(file_path, 'r') as file:
content = file.read()
print(f"{filename}: {content[:20]}...")--OUTPUT--file1.txt: This is the content...
file2.csv: name,age,city...
data.json: {"key": "value"...
image.png: [binary content]...
Once you have a file path, reading its contents safely is the next step. Using a with open(...) statement, known as a context manager, is the standard way to handle files in Python. It’s a crucial practice because it automatically closes the file for you, which prevents errors and resource leaks.
- Before reading, it's wise to confirm the path points to a file using
os.path.isfile(). - Inside the
withblock, you can use methods like.read()to access the file’s content without worrying about cleanup.
Filtering files by extension
import os
directory = "sample_dir"
extension = ".txt"
txt_files = [f for f in os.listdir(directory)
if f.endswith(extension)]
print(f"Found {len(txt_files)} {extension} files:")
for file in txt_files:
print(file)--OUTPUT--Found 1 .txt files:
file1.txt
Often, you'll only need files of a specific type. A list comprehension offers a concise way to filter the results from os.listdir(). This technique builds a new list containing only the items that meet your criteria.
- The key is the
endswith()string method, which checks if a filename ends with a certain suffix, like".txt". - This approach is efficient and highly readable, making it a popular choice for simple filtering tasks without needing to import other modules.
Using generators for memory-efficient processing
import os
def file_reader(directory):
for filename in os.listdir(directory):
filepath = os.path.join(directory, filename)
if os.path.isfile(filepath):
yield filepath, filename
for filepath, filename in file_reader("sample_dir"):
print(f"Processing {filename}")--OUTPUT--Processing file1.txt
Processing file2.csv
Processing data.json
Processing image.png
When you're dealing with a large number of files, creating a list of all file paths at once can consume a lot of memory. A generator function offers a more efficient solution. Instead of returning a complete list, it uses the yield keyword to produce one item at a time, right when you need it.
- The function pauses its execution after each
yieldand resumes from the same spot on the next iteration. - This approach is ideal for large directories because it processes files one by one, keeping memory usage low.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. You can describe what you want to build, and its AI capabilities help bring your idea to life—complete with a user interface, backend logic, and deployment.
The file handling techniques you've learned can be the foundation for powerful tools. With Replit Agent, you can turn these concepts into production-ready applications simply by describing them. The agent can build, test, and deploy entire projects from a single prompt.
For example, you could use Replit Agent to build:
- A log file analyzer that uses
os.walk()to recursively scan a project directory, read all.logfiles, and generate a consolidated error report. - A batch data processor that uses
globto find all CSV files in a directory, reads their contents, and aggregates the data into a summary dashboard. - A digital asset manager that uses
pathlibto organize files into subdirectories based on their extension or creation date.
Describe your next project, and Replit Agent will write the code, handle dependencies, and deploy it for you, all within your browser.
Common errors and challenges
Navigating directories can sometimes lead to common pitfalls like permission errors, missing folders, or incorrect file path references.
You might encounter a PermissionError when your script tries to access a directory it doesn't have read permissions for, which is common in protected system folders. To prevent your program from crashing, you can wrap your directory traversal code in a try...except PermissionError block. This allows your script to gracefully skip the inaccessible directory and continue its work.
Attempting to list files in a directory that doesn't exist will raise a FileNotFoundError. You can handle this by wrapping your os.listdir() call in a try...except FileNotFoundError block. A more direct approach is to first check if the path is valid with os.path.exists(), ensuring you only try to read from directories that actually exist.
A frequent mistake is trying to open a file using only the name returned by os.listdir(). Since this function provides just the base filename and not the full path, your script will fail unless it's running from inside that same directory.
- The problem: Trying to
open('report.txt')will look for the file in the current working directory, not the target directory. - The solution: Always construct the full path by joining the directory path with the filename using
os.path.join()before attempting to open or process it.
Handling permission errors when traversing directories
Traversing system directories often triggers a PermissionError because your script lacks the required access rights. This is a common issue when working with protected locations like /var/log. The following code demonstrates what happens when you run into this problem.
import os
directory = "/var/log"
for root, dirs, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
with open(file_path, 'r') as f:
print(f"Content: {f.read()[:10]}")
This code fails because it tries to open() and read every file, including protected system logs. The operating system denies access, which raises a PermissionError and halts the script. The following example demonstrates how to prevent this crash.
import os
directory = "/var/log"
for root, dirs, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
try:
with open(file_path, 'r') as f:
print(f"Content: {f.read()[:10]}")
except PermissionError:
print(f"Permission denied: {file_path}")
By wrapping the file operation within a try...except PermissionError block, the script can safely handle protected files. If an error occurs, the except block catches it and prints a warning, allowing the loop to continue to the next file. This prevents a single inaccessible file from halting your entire program. It's a crucial pattern to use when scanning directories where you don't control all the permissions, such as system folders or user-generated content.
Dealing with non-existent directories using os.listdir()
A common mistake is assuming a directory path is always valid. If you pass a non-existent path to os.listdir(), your program will crash with a FileNotFoundError. This often happens when dealing with user input. The following code demonstrates this exact issue.
import os
user_input = input("Enter directory to list: ")
files = os.listdir(user_input)
for file in files:
print(file)
This code directly passes the user's raw input to os.listdir(). If the entered path doesn't exist, the program crashes. The following example demonstrates how to handle this gracefully by first validating the path.
import os
user_input = input("Enter directory to list: ")
try:
files = os.listdir(user_input)
for file in files:
print(file)
except FileNotFoundError:
print(f"Directory does not exist: {user_input}")
By wrapping the os.listdir() call in a try...except FileNotFoundError block, you can catch the error when a directory doesn't exist. This prevents the program from crashing. Instead of stopping, the except block runs, printing a user-friendly message. This is especially useful when the directory path comes from user input or a configuration file, where you can't guarantee its validity beforehand.
Using incorrect file paths with os.listdir()
A classic mistake is using the filenames from os.listdir() directly. This function only returns the name, not the full path, causing an error if your script isn't in the same directory. The code below shows exactly what goes wrong.
import os
directory = "data/logs"
files = os.listdir(directory)
for file in files:
with open(file, 'r') as f:
content = f.read()
print(f"{file}: {content[:10]}")
The open() call searches for the file in the script's current directory, not inside the data/logs folder. This path mismatch causes a FileNotFoundError. The following example demonstrates the correct approach.
import os
directory = "data/logs"
files = os.listdir(directory)
for file in files:
file_path = os.path.join(directory, file)
with open(file_path, 'r') as f:
content = f.read()
print(f"{file}: {content[:10]}")
The fix is to always build the full path. By using os.path.join(directory, file), you combine the directory path with the filename returned by os.listdir(). This creates a complete, correct path that open() can find. This simple step prevents FileNotFoundError and ensures your script reliably accesses files, no matter where it's run from. It's a crucial habit to adopt whenever you're iterating over directory contents.
Real-world applications
With the fundamentals and error handling covered, you can apply these skills to practical tasks like finding large files and organizing directories.
Finding large files with os.walk() for disk cleanup
By pairing os.walk() with os.path.getsize(), you can create a simple script to locate large files hidden anywhere in a directory tree, making cleanup much easier.
import os
def find_large_files(directory, threshold_mb=10):
large_files = []
for root, _, files in os.walk(directory):
for file in files:
file_path = os.path.join(root, file)
size_mb = os.path.getsize(file_path) / (1024 * 1024)
if size_mb > threshold_mb:
large_files.append((file_path, size_mb))
return large_files
results = find_large_files("sample_dir", 5)
for path, size in results:
print(f"{path}: {size:.2f} MB")
This function, find_large_files, recursively scans a directory to find files exceeding a specified size. It combines a few key operations to get the job done efficiently.
- It uses
os.walk()to traverse the entire directory tree, ensuring no file is missed. - For each file,
os.path.getsize()retrieves its size in bytes, which the code then converts to megabytes. - A simple comparison checks if the file's size is greater than the
threshold_mbyou provide.
Finally, it returns a list containing the path and size of every file that meets the criteria.
Organizing files by type with shutil and os.makedirs()
You can bring order to a messy directory by using os.makedirs() to create subfolders for different file types and the shutil module to copy each file into its correct place.
import os
import shutil
for file in os.listdir("sample_dir"):
file_path = os.path.join("sample_dir", file)
ext = os.path.splitext(file)[1].lower()
# Determine category based on extension
category = None
if ext in ['.jpg', '.png', '.gif']:
category = 'images'
elif ext in ['.pdf', '.txt', '.csv']:
category = 'documents'
if category:
os.makedirs(f"organized/{category}", exist_ok=True)
shutil.copy2(file_path, f"organized/{category}/{file}")
print(f"Copied {file} to {category} folder")
This script automates file organization by looping through a directory. For each file, it uses os.path.splitext() to isolate the extension and then assigns it to a category like 'images' or 'documents'.
- The
os.makedirs()function creates a destination folder for the category. Usingexist_ok=Truecleverly prevents the script from crashing if the folder already exists. - Finally,
shutil.copy2()copies the file into its new home, preserving important metadata like the original creation and modification times.
Get started with Replit
Now, turn these concepts into a real tool. Tell Replit Agent to “build a script that finds all JPGs over 10MB” or “create a dashboard that aggregates sales data from all CSV files in a directory.”
The agent writes the code, handles testing, and deploys the app for you. It turns your description into a finished product. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.


.png)
.png)