How to read a binary file in Python
Learn how to read binary files in Python. This guide covers various methods, tips, real-world applications, and common error debugging.

To read binary files in Python is a crucial skill for non-text data like images, audio, or executables. Python's built-in functions simplify this with the correct file mode, like 'rb'.
Here, you'll learn techniques to interpret binary data correctly. You will get practical tips, explore real-world applications, and receive clear debugging advice to help you handle any binary file with confidence.
Reading a binary file using open() with 'rb' mode
with open('sample.bin', 'rb') as file:
binary_data = file.read()
print(f"Read {len(binary_data)} bytes")
print(binary_data[:10]) # Display first 10 bytes--OUTPUT--Read 1024 bytes
b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\x09'
The with statement ensures your file is closed automatically, which is a good practice for resource management. The crucial part is the 'rb' mode, which tells Python to read the file in binary format.
- This mode reads the raw bytes directly from the file.
- It prevents Python from trying to decode the data as text, which would corrupt the contents of a non-text file.
The file.read() method pulls the entire file content into memory as a single bytes object. You can see this in the output, where the leading b indicates you're working with a bytes literal, not a regular string.
Basic binary file handling techniques
While file.read() is great for small files, you'll need more advanced techniques to efficiently manage memory when working with larger binary data sets.
Reading binary files in chunks with read()
with open('sample.bin', 'rb') as file:
chunk1 = file.read(4) # Read first 4 bytes
chunk2 = file.read(4) # Read next 4 bytes
print(f"First chunk: {chunk1}")
print(f"Second chunk: {chunk2}")--OUTPUT--First chunk: b'\x00\x01\x02\x03'
Second chunk: b'\x04\x05\x06\x07'
Instead of reading the entire file at once, you can pass an integer to the read() method. This tells Python exactly how many bytes to read. This approach is much more memory-efficient for large files, as you only process small pieces at a time.
- Each call to
read()advances the file's internal pointer. - As shown in the example, calling
read(4)twice reads two consecutive 4-byte chunks from the file.
Using readinto() for direct buffer reading
import array
with open('sample.bin', 'rb') as file:
buffer = array.array('B', [0] * 10) # Create a byte array
bytes_read = file.readinto(buffer) # Read directly into buffer
print(f"Bytes read: {bytes_read}")
print(f"Buffer content: {buffer}")--OUTPUT--Bytes read: 10
Buffer content: array('B', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
The readinto() method offers a high-performance way to handle binary data. Instead of creating a new bytes object with each call, it reads data directly into a pre-allocated, mutable buffer—like the array.array in the example. This approach is more memory-efficient because you reuse the same buffer, avoiding repeated memory allocations.
- It modifies the buffer in place, filling it with bytes from the file's current position.
- The function returns the number of bytes it successfully read, which can be less than the buffer size if you're near the end of the file.
Memory-mapped binary files with mmap
import mmap
import os
with open('sample.bin', 'rb') as file:
with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mmapped_file:
print(f"File size: {len(mmapped_file)} bytes")
print(f"First 5 bytes: {mmapped_file[:5]}")--OUTPUT--File size: 1024 bytes
First 5 bytes: b'\x00\x01\x02\x03\x04'
For maximum efficiency with large files, you can use the mmap module. It creates a memory-mapped file, treating the file on disk as if it were an object in memory. The OS manages loading data as needed, so you don't use up RAM by reading the whole file at once.
- The resulting object acts like a
bytesobject, letting you use standard slicing likemmapped_file[:5]to access any part of the file instantly. - You create it using
mmap.mmap(), which needs the file's descriptor fromfile.fileno().
Advanced binary file operations
Beyond just reading raw bytes, you'll often need to interpret structured data, which is where powerful tools like the struct module, NumPy, and the io module come in.
Parsing binary data with struct
import struct
with open('numbers.bin', 'rb') as file:
# Read 2 integers (8 bytes) in big-endian format
data = file.read(8)
numbers = struct.unpack('>ii', data)
print(f"Unpacked integers: {numbers}")--OUTPUT--Unpacked integers: (16909060, 84281096)
The struct module is your tool for decoding bytes that follow a specific, fixed structure. It's especially useful for reading data created by other languages. The struct.unpack() function takes a format string and a byte string, converting the raw bytes into a tuple of Python values.
- The format string—here,
'>ii'—is your blueprint for the data. - The
>character sets the byte order to big-endian, ensuring numbers are interpreted correctly. Eachitells the function to read a 4-byte integer.
As a result, you can precisely transform a chunk of binary data into usable numbers.
Reading binary files with NumPy
import numpy as np
# Read binary file directly into a NumPy array
data = np.fromfile('matrix.bin', dtype=np.float32)
# Reshape if it represents a matrix (e.g., 3x3)
matrix = data.reshape((3, 3))
print(matrix)--OUTPUT--[[1.1 2.2 3.3]
[4.4 5.5 6.6]
[7.7 8.8 9.9]]
When you're working with numerical data, like from scientific instruments or machine learning models, NumPy is your best bet. The np.fromfile() function is incredibly efficient because it reads the binary file directly into a NumPy array.
- You must specify the
dtype, likenp.float32, which tells NumPy how to interpret the raw bytes. This ensures the data is converted into the correct numerical format. - Since the data is read as a flat list of numbers, you can use methods like
reshape()to organize it into a more useful structure, such as a matrix.
Using the io module for binary data
import io
# Create a binary stream from bytes
binary_data = b'\x00\x01\x02\x03\x04\x05'
binary_stream = io.BytesIO(binary_data)
print(binary_stream.read(2))
binary_stream.seek(4) # Move to position 4
print(binary_stream.read(2))--OUTPUT--b'\x00\x01'
b'\x04\x05'
The io module lets you treat binary data in memory as if it were a file. Using io.BytesIO, you can wrap a bytes object to create an in-memory binary stream. This is incredibly useful when you're working with APIs that expect a file object, but you only have the data in a variable. You can then use standard file operations on this stream.
- The
read()method works just like it does with a regular file, letting you pull out a specific number of bytes. - With
seek(), you can move the stream's internal pointer to any byte position, allowing for non-sequential data access.
Move faster with Replit
Replit is an AI-powered development platform that transforms natural language into working applications. You can describe what you want to build, and its AI capabilities help bring your idea to life.
For the binary file techniques we've explored, Replit Agent can turn them into production-ready tools. It builds complete applications—with databases, APIs, and deployment—directly from your descriptions.
- Build an image metadata utility that uses the
structmodule to parse file headers and extract dimensions and color information. - Create a scientific data dashboard that reads large binary datasets with
numpy.fromfileand generates interactive visualizations. - Deploy a custom log converter that processes proprietary binary files into a structured format like JSON.
You can turn these concepts into fully functional applications without getting bogged down in boilerplate. Describe your tool, and Replit Agent will write the code, test it, and deploy it automatically.
Common errors and challenges
Even with the right tools, you might run into a few common roadblocks when reading binary files, but they're easy to sidestep once you know what to look for.
Handling file not found errors with try-except
One of the most frequent issues is the FileNotFoundError. This happens if the file you're trying to open doesn't exist at the specified path. To prevent your program from crashing, you can wrap your file-opening logic in a try-except block. This allows you to catch the error gracefully and handle it, perhaps by notifying the user or creating a default file.
Debugging endianness issues with struct
Endianness issues can be tricky to debug. This refers to the byte order—big-endian or little-endian—used to store multi-byte numbers. If you read data created on a system with a different byte order, your numbers will be scrambled. When using the struct module, you can solve this by explicitly setting the byte order in your format string with > for big-endian or < for little-endian, ensuring your data is interpreted correctly regardless of its origin.
Avoiding string vs bytes confusion in binary mode
A classic mistake is confusing bytes with str objects. When you open a file in binary mode ('rb'), Python gives you raw bytes, not a text string. Trying to use string-specific methods on a bytes object will result in a TypeError. If you know a portion of the binary data represents text, you must explicitly convert it using the decode() method with the correct encoding, such as 'utf-8'.
Handling file not found errors with try-except
Without a try-except block, your script is vulnerable to crashing if a file is missing. This is a fragile way to write code, as even a simple typo in a filename can bring everything to a halt. See what happens below.
# This code will crash when the file doesn't exist
file = open('missing_file.bin', 'rb')
binary_data = file.read()
file.close()
print(f"Read {len(binary_data)} bytes")
The script fails because it directly calls open() on a missing file, triggering an unhandled FileNotFoundError that halts the program. The next example shows how to manage this gracefully.
# Using try-except to handle file not found errors
try:
with open('missing_file.bin', 'rb') as file:
binary_data = file.read()
print(f"Read {len(binary_data)} bytes")
except FileNotFoundError:
print("Error: File not found")
except PermissionError:
print("Error: Permission denied")
By wrapping the file operation in a try-except block, you can gracefully manage errors. The code inside the try block is executed, but if a FileNotFoundError occurs, the program doesn't crash. Instead, it jumps to the corresponding except block and prints a user-friendly message. This approach also allows you to catch other common issues, like PermissionError, making your script much more resilient and predictable when dealing with file system interactions.
Debugging endianness issues with struct
Endianness refers to the byte order of multi-byte data types. If you read data created on a system with a different byte order—like network data, which is often big-endian—your numbers will be scrambled if you don't specify the correct format.
The following code shows what happens when you use struct.unpack with its default setting on data that requires a specific byte order, leading to incorrect results.
import struct
with open('network_data.bin', 'rb') as file:
data = file.read(4)
value = struct.unpack('i', data)[0] # Using default endianness
print(f"Value: {value}") # May produce unexpected results
The format string 'i' defaults to your system's native byte order, which can misinterpret data from other systems. If the file uses a different endianness, the resulting number will be incorrect. See how to fix this below.
import struct
with open('network_data.bin', 'rb') as file:
data = file.read(4)
# Explicitly specify endianness ('>i' for big-endian, '<i' for little-endian)
value = struct.unpack('>i', data)[0]
print(f"Value: {value}")
The fix is to explicitly tell struct.unpack() which byte order to use in the format string. Adding > for big-endian or < for little-endian before the type character, as in '>i', ensures the bytes are interpreted correctly. This is essential when you're reading data from network protocols or files created on different systems, as they often rely on a standard byte order to work universally.
Avoiding string vs bytes confusion in binary mode
Python makes a sharp distinction between text (str) and raw data (bytes). If you open a file in binary write mode ('wb'), you can't write a string to it directly. Doing so raises a TypeError, as the code below shows.
# This will raise a TypeError
with open('output.bin', 'wb') as file:
file.write("Hello, binary world!") # String, not bytes
The code triggers a TypeError because the write() method, when used with a binary file, can't process a standard string. It needs raw bytes. See how to provide the correct data format in the example below.
# Correctly writing bytes to a binary file
with open('output.bin', 'wb') as file:
file.write(b"Hello, binary world!") # Bytes literal
# Alternative: file.write("Hello, binary world!".encode('utf-8'))
To fix the TypeError, you must provide the write() method with a bytes object, not a string. You can do this by creating a bytes literal directly—like b"Hello"—or by converting a string into bytes using the .encode() method. This is crucial whenever you open a file in binary write mode ('wb') to ensure the data format is correct and avoid errors.
Real-world applications
Putting these techniques into practice lets you solve real-world challenges, from parsing PNG image headers to processing sensor data in ZIP archives.
Extracting dimensions from PNG images with struct.unpack()
Since PNG files have a standardized structure, you can use the seek() method to navigate to the exact location of the image dimensions and then use struct.unpack() to read them.
with open('image.png', 'rb') as f:
f.seek(16) # Skip PNG signature and chunk info
width_height = f.read(8)
import struct
width, height = struct.unpack('>II', width_height)
print(f"Image dimensions: {width}x{height} pixels")
This code efficiently extracts a PNG's dimensions by directly accessing the file's binary data. It bypasses the initial file signature and header information with f.seek(16), positioning the reader right where the dimension data is stored.
- The
f.read(8)call then grabs the next 8 bytes, which contain the width and height. - Finally,
struct.unpack('>II', ...)decodes these bytes. The format string specifies two 4-byte unsigned integers (II) in big-endian order (>), converting the raw data into usable numbers.
Processing sensor data from ZIP archives using zipfile
You can use the zipfile module to read binary files directly from a ZIP archive, an efficient method for processing bundled data like sensor readings without extracting them to disk.
import zipfile
import struct
with zipfile.ZipFile('data_archive.zip', 'r') as zip_ref:
with zip_ref.open('sensor_readings.bin') as bin_file:
# Read a series of timestamp-temperature pairs
for _ in range(3):
data = bin_file.read(8)
timestamp, temperature = struct.unpack('ff', data)
print(f"Time: {timestamp:.1f}s, Temperature: {temperature:.1f}°C")
This script reads a binary file directly from a ZIP archive, which is great for handling bundled data without extracting it first. The zipfile.ZipFile context manager opens the archive, and zip_ref.open() accesses the specific binary file inside.
- The code iterates, reading 8-byte chunks from the file stream.
struct.unpack('ff', data)then interprets each chunk as two 4-byte floating-point numbers, representing a timestamp and temperature.
This approach lets you process structured binary data on the fly.
Get started with Replit
Now, turn these techniques into a real application with Replit Agent. Just describe your goal, like “build a tool that reads a PNG file and tells me its dimensions” or “create an app that visualizes sensor data from a binary log”.
The agent writes the code, tests for errors, and deploys your application from your description. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create & deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



.png)