How to hash a string in Python

A guide to hashing strings in Python. Learn various methods, see real-world applications, and get tips for debugging common errors.

Published on:

Tue

Apr 21, 2026

Updated on:

Wed

Apr 22, 2026

The Replit Team

ON THIS PAGE

Example H2

Python's hashlib library makes it simple to hash a string for data security and integrity. This process transforms text into a unique fixed-size value, essential for password storage and data verification.

In this article, you'll explore techniques to hash strings and see real-world applications. You'll also find practical tips and advice to debug your code, which helps you implement secure hashes.

Using `hashlib.md5` for basic hashing

import hashlib string_to_hash = "Hello, World!" hashed_string = hashlib.md5(string_to_hash.encode()).hexdigest() print(hashed_string)--OUTPUT--65a8e27d8879283831b664bd8b7f0ad4

The MD5 algorithm provides a quick way to generate a hash. The process involves a couple of key steps before you get the final hexadecimal string.

First, you must call .encode() on your string. Hashing functions in Python don't work on strings directly; they require a sequence of bytes.
Next, the .hexdigest() method converts the resulting binary hash into a more common and readable hexadecimal format.

Common hashing algorithms

While md5 is a common starting point, the hashlib library provides more robust algorithms, each offering different trade-offs between security and performance.

Using `hashlib.sha1` for improved hashing

import hashlib string_to_hash = "Hello, World!" sha1_hash = hashlib.sha1(string_to_hash.encode()).hexdigest() print(sha1_hash)--OUTPUT--0a0a9f2a6772942557ab5355d76af442f8f65e01

The SHA-1 algorithm, accessed with hashlib.sha1(), offers a more secure alternative to MD5. The implementation is nearly identical, but the underlying algorithm is stronger.

The key difference is the output. SHA-1 produces a 160-bit hash (40 hexadecimal characters), which is longer than MD5's 128 bits and more resistant to collisions.

Although SHA-1 is an improvement, security standards have evolved, and it's now considered weak for many uses. It serves as a good bridge to understanding even stronger algorithms.

Using `hashlib.sha256` for stronger security

import hashlib string_to_hash = "Hello, World!" sha256_hash = hashlib.sha256(string_to_hash.encode()).hexdigest() print(sha256_hash)--OUTPUT--dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f

For strong security, you'll want to use hashlib.sha256(). It's a modern standard from the widely trusted SHA-2 family, used for everything from digital signatures to password protection. While the implementation looks familiar, the output is significantly more secure.

It produces a 256-bit hash, which is 64 hexadecimal characters long. This increased length makes it far more resistant to attacks than older algorithms like SHA-1.

Using `hashlib.blake2b` for high-performance hashing

import hashlib string_to_hash = "Hello, World!" blake2b_hash = hashlib.blake2b(string_to_hash.encode()).hexdigest() print(blake2b_hash)--OUTPUT--e4cfa39a3d37be31c59609e807970799caa68a19bfaa15135f165085e01d41a65ba1e1b146aeb6bd0092b49eac214c103ccfa3a365954bbbe52f74a2b3620c94

When speed is a top priority, hashlib.blake2b() is an excellent choice. It's a modern algorithm designed for high performance without sacrificing security. In many scenarios, it's even faster than sha256.

The blake2b function generates a 512-bit hash by default, which is significantly longer and more secure than many alternatives.

This combination of speed and strength makes it ideal for applications like file verification or large-scale data integrity checks where efficiency is critical.

Advanced hashing techniques

Moving past standard hashing functions, you can build more robust solutions for tasks like message authentication with hmac or concurrent data processing.

Creating keyed hashes with `hmac`

import hmac import hashlib key = b"secret_key" message = "Hello, World!" hmac_digest = hmac.new(key, message.encode(), hashlib.sha256).hexdigest() print(hmac_digest)--OUTPUT--4e61e7dec9833252395a9bd245d1fe8aafa6cbc334f120ab0e221c9ca1b084e4

The hmac module adds a layer of authenticity. It creates a Hash-based Message Authentication Code. Unlike a simple hash, an HMAC uses a secret key to verify both the integrity and the source of a message. This is crucial for secure communication, like validating API requests.

The hmac.new() function combines the key, your encoded message, and a standard hashing algorithm like hashlib.sha256.
Because the hash depends on the secret key, only someone who possesses the key can generate a valid signature.

Building a simple custom hash function

def custom_hash(text, prime=31): hash_value = 0 for char in text: hash_value = hash_value * prime + ord(char) return hash_value % (10**10) print(custom_hash("Hello, World!"))--OUTPUT--1762102304

Building your own hash function helps you understand the core principles. This custom_hash function isn't for production security, but it shows how a string can be turned into a unique numerical value through a simple algorithm.

The function processes each character, converting it to an integer with ord().
It updates the total hash by multiplying the current value by a prime number before adding the next character's value, which helps distribute the results.
The final modulo (%) operation constrains the result to a predictable size.

Implementing concurrent hashing for large data

import hashlib from concurrent.futures import ThreadPoolExecutor def hash_chunk(chunk): return hashlib.sha256(chunk.encode()).hexdigest() text = "Hello, World!" * 1000 chunks = [text[i:i+1000] for i in range(0, len(text), 1000)] with ThreadPoolExecutor(max_workers=4) as executor: hash_results = list(executor.map(hash_chunk, chunks)) print(f"Generated {len(hash_results)} chunk hashes. First hash: {hash_results[0][:10]}...")--OUTPUT--Generated 13 chunk hashes. First hash: f5b9ebd05f...

When you're hashing large files or datasets, processing everything sequentially can be a bottleneck. This approach uses concurrency to speed things up by breaking the work into smaller, parallel tasks.

First, the code splits the large string into manageable chunks.
A ThreadPoolExecutor then distributes these chunks across multiple worker threads.
Finally, executor.map() applies the hash_chunk function to each chunk at the same time, generating the hashes much more efficiently than a one-by-one approach.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. This helps you move from learning individual techniques, like the hashing methods covered here, to building complete applications faster.

Instead of piecing together functions, describe the app you want to build and Agent 4 will take it from idea to working product:

A file integrity checker that uses sha256 to generate unique hashes for uploads, ensuring data isn't corrupted or tampered with.
A secure API endpoint that validates requests using hmac and a secret key to confirm both the message's integrity and authenticity.
A data anonymization script that replaces sensitive user information with irreversible hashes to protect privacy in datasets.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

When hashing strings, you might encounter a few common issues, from type errors to security vulnerabilities and performance bottlenecks.

Fixing the `TypeError` when hashing strings with `hashlib`

A frequent stumbling block is the TypeError that occurs when you pass a string directly to a hashlib function. Hashing algorithms operate on bytes, not text characters, so you must first encode your string.

The fix is to call the .encode() method on your string before hashing it. This converts the string into the byte sequence the function expects and resolves the TypeError: Unicode-objects must be encoded before hashing error.

Using constant-time comparison for hash verification

When checking if two hashes match, using the standard equality operator (==) can expose your application to timing attacks. This operator can return False the moment it finds a mismatch, and the time it takes to do so can leak information about the hash's contents.

For secure verification, use the hmac.compare_digest() function. It's designed to take the same amount of time regardless of where the first difference is, making timing attacks ineffective.

Using incremental updates for hashing large files

Trying to hash a large file by loading it all into memory at once is inefficient and can easily lead to memory errors. A much better approach is to process the file incrementally.

Read the file in smaller, manageable chunks and feed each one to the hash object using its .update() method. After processing all the chunks, you can retrieve the final result with .hexdigest().

Fixing the `TypeError` when hashing strings with `hashlib`

A frequent mistake with hashlib is passing a string directly to a hashing function. These algorithms expect bytes, not text, and this mismatch triggers a TypeError. The code below demonstrates a typical scenario where this error occurs when hashing user input.

import hashlib user_input = "password123" # This will cause a TypeError hashed_password = hashlib.sha256(user_input).hexdigest() print(hashed_password)

The code attempts to hash the user_input string directly. Since hashlib.sha256() can't operate on text characters, this action triggers the TypeError. The corrected snippet below shows how to properly prepare the data for hashing.

import hashlib user_input = "password123" # Properly encode the string to bytes before hashing hashed_password = hashlib.sha256(user_input.encode()).hexdigest() print(hashed_password)

The corrected code works because user_input.encode() converts the string into the byte sequence that hashlib.sha256() requires. Hashing functions can't operate on raw text, which causes the initial TypeError. You'll need to remember this step whenever you're working with string-based data, especially from sources like user input, to ensure it's properly formatted for hashing. This simple conversion is the key to avoiding the error.

Using constant-time comparison for hash verification

A standard equality check with == seems straightforward for verifying hashes, but it opens the door to timing attacks. Because it can exit early on a mismatch, the comparison time can leak data. The following code shows this vulnerable approach in practice.

import hashlib def verify_password(stored_hash, password): calculated_hash = hashlib.sha256(password.encode()).hexdigest() return calculated_hash == stored_hash stored = "8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92" print(verify_password(stored, "123456"))

The function’s use of the == operator creates a security risk. Because it stops comparing at the first mismatch, the execution time can leak information about the hash's contents. See the corrected implementation below for a secure alternative.

import hashlib import hmac def verify_password(stored_hash, password): calculated_hash = hashlib.sha256(password.encode()).hexdigest() return hmac.compare_digest(calculated_hash, stored_hash) stored = "8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92" print(verify_password(stored, "123456"))

The corrected code swaps the vulnerable == operator for hmac.compare_digest(). This function is designed to prevent timing attacks by taking the same amount of time to compare two hashes, whether they match or not. This constant-time comparison is essential for security. Always use hmac.compare_digest() when verifying user-provided secrets, like passwords or session tokens, against a stored hash. It stops attackers from analyzing response times to guess the correct value.

Using incremental updates for hashing large files

When you're hashing a large file, reading its entire contents into memory at once with f.read() is a common but risky approach. It's inefficient and can easily exhaust your system's memory, causing your program to crash. The code below demonstrates this exact problem.

import hashlib def hash_large_file(filename): with open(filename, 'rb') as f: file_hash = hashlib.sha256(f.read()).hexdigest() return file_hash print(hash_large_file("large_file.txt"))

The code reads the entire file into memory using f.read(), which is inefficient for large datasets and can cause the program to fail. Check out the corrected implementation below for a more memory-safe approach.

import hashlib def hash_large_file(filename): hash_obj = hashlib.sha256() with open(filename, 'rb') as f: for chunk in iter(lambda: f.read(4096), b''): hash_obj.update(chunk) return hash_obj.hexdigest() print(hash_large_file("large_file.txt"))

The corrected code avoids memory errors by processing the file incrementally. Instead of loading everything at once, it reads the file in small chunks. Each chunk is then fed to the hash object using its .update() method. After all chunks are processed, you get the final hash. It's a memory-safe approach you'll want to use whenever hashing files that could be large, as it prevents your application from crashing due to insufficient memory.

Real-world applications

Beyond fixing errors, these hashing skills let you build powerful applications, from file integrity checkers to simple blockchains.

Using `hashlib` for file integrity verification

Hashing a file with hashlib creates a unique digital fingerprint, making it easy to confirm whether the file has been modified or corrupted.

import hashlib def get_file_hash(filename): with open(filename, 'rb') as file: file_hash = hashlib.sha256(file.read()).hexdigest() return file_hash # Create a sample file with open("test_file.txt", "w") as f: f.write("This is test content") print(get_file_hash("test_file.txt"))

The get_file_hash function calculates a SHA-256 hash for a file's contents. It opens the file in binary read mode ('rb') because hashing algorithms require byte data, not text. The function then uses file.read() to load the file's entire contents into memory before passing it to the hashing function.

The file's byte content is processed by hashlib.sha256() to generate the hash.
The .hexdigest() method converts the binary hash into a readable hexadecimal string.

Finally, the script creates a sample file and prints its calculated hash value to the console.

Creating a simple blockchain with `hashlib`

With hashlib, you can link data blocks together cryptographically, forming a simple blockchain where each block's integrity is tied to the one before it.

import hashlib import time class Block: def __init__(self, index, data, previous_hash): self.index = index self.timestamp = time.time() self.data = data self.previous_hash = previous_hash self.hash = self.calculate_hash() def calculate_hash(self): block_content = str(self.index) + str(self.timestamp) + str(self.data) + str(self.previous_hash) return hashlib.sha256(block_content.encode()).hexdigest() # Create a small blockchain genesis_block = Block(0, "Genesis Block", "0") second_block = Block(1, "Transaction Data", genesis_block.hash) print(f"Block 1: {genesis_block.hash[:15]}...") print(f"Block 2: {second_block.hash[:15]}...")

This code defines a Block class, the blueprint for each element in the blockchain. When a new block is created, its __init__ method captures its data, index, and the hash of the block that came before it, creating a historical link.

The calculate_hash function generates a unique SHA-256 hash by combining all the block's information, including its timestamp.
The script then creates a chain. The second_block is explicitly linked to the first by using genesis_block.hash as its previous_hash, demonstrating the core principle of a blockchain.

Get started with Replit

Turn your knowledge into a real tool with Replit Agent. Describe what you want to build, like “a file integrity checker using sha256” or “a simple API that validates requests with hmac.”

Replit Agent writes the code, tests for errors, and deploys your application. Start building with Replit.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started free

Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.

Get started for free

Follow @Replit