How to remove duplicates from a string in Python
Learn how to remove duplicate characters from a string in Python. Explore various methods, tips, real-world uses, and debugging techniques.

Duplicate characters in a string often need removal, a common task in data cleaning and text processing. Python offers several efficient methods to handle this with concise and readable code.
In this article, we'll cover several techniques and their real-world applications. We'll also explore practical tips and debugging advice to help you write cleaner, more efficient Python code for any project.
Basic loop to remove duplicates
def remove_duplicates(s):
result = ""
for char in s:
if char not in result:
result += char
return result
print(remove_duplicates("hello world"))--OUTPUT--helo wrd
The remove_duplicates function offers a straightforward approach by iterating through the input string and building a new result string. This method ensures each character is added only once, preserving the original order of appearance.
The core of this logic is the if char not in result check. While effective, keep in mind that this check can become less efficient on very long strings because it has to scan the result string with each iteration. For smaller inputs, it’s a perfectly clear and readable solution.
Common data structures for duplicate removal
To improve on the loop's performance, especially with long strings, you can turn to Python's built-in data structures.
Using set() (order not preserved)
def remove_duplicates(s):
return ''.join(set(s))
print(remove_duplicates("hello world"))--OUTPUT--dehlorw
This one-liner is a more Pythonic and efficient approach. It works by converting the string into a set, a data structure that automatically handles uniqueness by only storing one of each element. Then, ''.join() is used to combine the characters from the set back into a string.
- This is a very fast method for duplicate removal.
- The main trade-off is that sets are unordered, so the character sequence from the original string is lost.
Using dict.fromkeys() to preserve order
def remove_duplicates(s):
return ''.join(dict.fromkeys(s))
print(remove_duplicates("hello world"))--OUTPUT--helo wrd
This method offers the best of both worlds—the efficiency of a hash-based data structure and the preservation of order. The dict.fromkeys() function creates a dictionary where each character from the string becomes a key. Since dictionary keys must be unique, duplicates are automatically dropped.
- It’s a neat trick that relies on a key feature: since Python 3.7, dictionaries remember insertion order. This is why the original character sequence is maintained.
- Finally,
''.join()converts the dictionary keys back into a string, giving you a duplicate-free result.
Using OrderedDict for older Python versions
from collections import OrderedDict
def remove_duplicates(s):
return ''.join(OrderedDict.fromkeys(s))
print(remove_duplicates("hello world"))--OUTPUT--helo wrd
For Python versions before 3.7, standard dictionaries didn't preserve insertion order, meaning you couldn't rely on them to keep your characters in sequence. This is where OrderedDict from the collections module comes in handy. It’s a dictionary subclass that was specifically designed to remember the order in which items were added.
- Using
OrderedDict.fromkeys()provides the same efficient duplicate removal while guaranteeing that the original character order is maintained, making your code backward-compatible.
Advanced techniques for character deduplication
While data structures offer clean solutions, you can gain more fine-grained control or explore functional patterns with generators, functools.reduce(), and string slicing.
Using a generator function
def unique_chars(s):
seen = set()
for char in s:
if char not in seen:
seen.add(char)
yield char
print(''.join(unique_chars("hello world")))--OUTPUT--helo wrd
A generator function like unique_chars offers a memory-efficient way to handle this task. Instead of creating a new data structure upfront, it uses the yield keyword to produce characters one by one as they're processed. This is especially useful for very large strings where memory usage is a concern.
- The function maintains a
seenset to track which characters have already been yielded, preventing duplicates. - Because it processes the string sequentially, the original character order is preserved.
- Finally,
''.join()is used to collect all the yielded characters into the resulting string.
Using functools.reduce() for functional approach
from functools import reduce
def remove_duplicates(s):
return reduce(lambda result, char: result + char if char not in result else result, s, "")
print(remove_duplicates("hello world"))--OUTPUT--helo wrd
For a functional programming twist, you can use functools.reduce(). This function cumulatively applies a lambda function to each character, effectively “reducing” the entire string into a single output.
- The
lambdachecks if a character is already present in the accumulatedresultstring. - If the character is new, it’s added; otherwise, the
resultis passed along unchanged. - This process preserves the original character order.
It’s a clever one-liner, but be aware that its performance is similar to the basic loop due to the in check on a growing string.
Using string slicing with index checks
def remove_duplicates(s):
return ''.join(char for i, char in enumerate(s) if s.find(char) == i)
print(remove_duplicates("hello world"))--OUTPUT--helo wrd
This clever one-liner uses a generator expression to compare a character's current index with its first appearance. The enumerate() function gets each character and its index i, while s.find(char) gets the index of that character's first occurrence.
- The condition
if s.find(char) == iis only true the first time a character is seen. - This approach preserves the original order because it processes the string sequentially.
- Finally,
''.join()stitches the unique characters back into a final string.
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly.
Instead of piecing together techniques, you can use Agent 4 to build complete applications from a simple description. It handles the code, databases, APIs, and deployment, letting you create practical tools like:
- A username validator that ensures all characters in a new username are unique.
- A text-cleaning tool that processes raw user input by removing duplicate characters to create clean keyword lists.
- A unique code generator for promotional campaigns, ensuring each character in a code is used only once.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
While removing duplicates seems simple, a few common pitfalls can trip you up, from performance bottlenecks to unexpected case-sensitivity issues.
First, remember that Python strings are immutable—you can't change them in place. This is why solutions build a new string instead of modifying the original. While you won't get a mutation error with strings, trying to modify other data types like lists while iterating over them can cause bugs, so building a new result is a safe and reliable habit.
The in operator's performance is also a crucial detail. When you check if char not in result_string, Python scans the string from the beginning each time, which gets very slow with longer inputs. Using a set for lookups is much faster because sets are optimized for this exact task, making the check nearly instantaneous regardless of size.
Finally, don't forget about case sensitivity. By default, 'A' and 'a' are treated as different characters. If you want to remove duplicates regardless of case, you need to standardize the string first. Simply call the lower() method on your input string before processing it to ensure characters like 'H' and 'h' are treated as the same.
Avoiding mutation errors when using for loops
It’s tempting to modify a string or list while looping over it, but this often leads to bugs. As you remove elements, the sequence's length shrinks, causing the loop to skip items unexpectedly. The code below shows what happens when this goes wrong.
def remove_adjacent_duplicates(s):
for i in range(len(s) - 1):
if s[i] == s[i + 1]:
s = s[:i] + s[i+1:]
return s
print(remove_adjacent_duplicates("hello"))
The loop's range is fixed at the start. When s is reassigned, its length changes mid-iteration, causing the original range to become invalid and leading to an error. The corrected code below shows a safer pattern.
def remove_adjacent_duplicates(s):
result = s
i = 0
while i < len(result) - 1:
if result[i] == result[i + 1]:
result = result[:i] + result[i+1:]
else:
i += 1
return result
print(remove_adjacent_duplicates("hello"))
The corrected code uses a while loop, which is safer for modifying a sequence you're iterating over. Unlike a for loop's fixed range, the while loop's condition re-evaluates the string's length on each pass. The index i is only incremented when no characters are removed. This clever adjustment ensures you don't skip over adjacent duplicates that shift into place after a removal, preventing unexpected bugs in your logic.
Optimizing the in operator for better performance
The in operator is convenient, but its performance degrades significantly when used on long strings. Each check forces Python to scan the growing result string from the start, creating a bottleneck. The code below demonstrates this slowdown with a long, repetitive string.
def remove_duplicates_slow(s):
result = ""
for char in s:
if char not in result:
result += char
return result
long_string = "abcdefghijklmnopqrstuvwxyz" * 100
print(len(remove_duplicates_slow(long_string)))
This function's performance degrades because the result string grows with each loop. The in operator must scan more characters every time, causing a major slowdown with large inputs. The code below shows a much faster solution.
def remove_duplicates_fast(s):
seen = set()
result = []
for char in s:
if char not in seen:
seen.add(char)
result.append(char)
return ''.join(result)
long_string = "abcdefghijklmnopqrstuvwxyz" * 100
print(len(remove_duplicates_fast(long_string)))
The remove_duplicates_fast function solves the performance issue by using a set for lookups. Checking if char not in seen is nearly instant, no matter how large the set gets, which avoids the bottleneck of scanning a string repeatedly.
A separate list, result, collects the unique characters to preserve their original order. Finally, ''.join(result) efficiently combines them into the final string. This pattern is ideal for processing large inputs where performance is critical.
Handling case sensitivity with the lower() method
This case sensitivity means that even efficient methods like dict.fromkeys() will treat uppercase and lowercase letters as unique. As a result, characters you might consider duplicates are kept. The following code shows this in action with the string "Hello World".
def remove_duplicates(s):
return ''.join(dict.fromkeys(s))
print(remove_duplicates("Hello World"))
The output is 'Helo Wrd'. The function keeps both 'H' and 'W' because they are distinct from their lowercase versions, leaving the string only partially deduplicated. The corrected code below shows how to handle this.
def remove_duplicates_case_insensitive(s):
seen = {}
result = []
for char in s:
if char.lower() not in seen:
seen[char.lower()] = True
result.append(char)
return ''.join(result)
print(remove_duplicates_case_insensitive("Hello World"))
The remove_duplicates_case_insensitive function uses a seen dictionary to track lowercase versions of characters. It adds the original character to a result list only if its lowercase version hasn't been seen before. It's a clever trick that ensures case-insensitive uniqueness while preserving the original casing of the first character encountered. Keep this pattern in mind when processing user input or text where capitalization varies but the underlying character is the same.
Real-world applications
With these techniques and error-handling patterns in your toolkit, you can build robust features for real-world applications.
Normalizing usernames by removing repeated characters
Normalizing usernames by removing repeated characters is a practical way to ensure consistency and prevent user confusion, effectively turning an input like “JJohhnDoe” into “johnde”.
def normalize_username(username):
return ''.join(dict.fromkeys(username.lower()))
print(normalize_username("JJohhnDoe"))
print(normalize_username("marrysmith"))
This function is a compact way to standardize user input. It works by first making the username lowercase and then removing any duplicate letters.
- The
username.lower()method ensures that characters like 'J' and 'j' are treated as the same. - Then,
dict.fromkeys()uses the characters as dictionary keys. This step automatically discards duplicates while preserving the order of the first appearance. - Finally,
''.join()stitches the unique characters back into a clean string, turning an input like "marrysmith" into "marysith".
Creating unique ID codes from product descriptions
These deduplication techniques can also generate unique ID codes from product descriptions, turning long text into a concise identifier.
def generate_product_id(description, max_length=10):
unique_chars = ''.join(dict.fromkeys(description.lower().replace(" ", "")))
return unique_chars[:max_length]
products = ["Blue Cotton T-shirt", "Red Wool Sweater"]
for product in products:
print(f"{product} -> {generate_product_id(product)}")
The generate_product_id function creates a short identifier by chaining together several operations. It first standardizes the description by converting it to lowercase and removing all spaces with replace(" ", ""). Next, it filters out duplicate characters using dict.fromkeys() while preserving the order of each character's first appearance.
Finally, the function truncates the result to a default of 10 characters using slicing. This creates a concise and predictable ID from a longer string, making it useful for generating unique product codes on the fly.
Get started with Replit
Turn these techniques into a real tool. Describe what you want to build to Replit Agent, like “a tool that generates unique coupon codes from product names” or “a utility that cleans keyword lists by removing duplicate characters”.
Replit Agent writes the code, tests for errors, and deploys your application automatically. Start building with Replit.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.
Create and deploy websites, automations, internal tools, data pipelines and more in any programming language without setup, downloads or extra tools. All in a single cloud workspace with AI built in.



