How to use 'defaultdict' in Python

Master Python's defaultdict. This guide covers usage, tips, real-world applications, and how to debug common errors.

Published on: 
Tue
Feb 24, 2026
Updated on: 
Mon
Apr 6, 2026
The Replit Team

Python's defaultdict is a powerful container from the collections module. It simplifies dictionary management by automatically assigning a default value for missing keys, which helps you avoid KeyError exceptions.

In this article, we'll cover several techniques to use defaultdict with different data types. You'll find real-world applications, practical tips, and debugging advice to help you write cleaner, more efficient Python code.

Basic usage of defaultdict

from collections import defaultdict

# Create a defaultdict with int as default factory
word_count = defaultdict(int)
for word in ["apple", "banana", "apple", "orange"]:
word_count[word] += 1

print(dict(word_count))--OUTPUT--{'apple': 2, 'banana': 1, 'orange': 1}

The example uses defaultdict(int) to count word frequencies. When a key is accessed for the first time, the int factory is called without arguments, which returns its default value of 0. This is why you can immediately use the += operator on any key.

Without defaultdict, you'd need to handle the initial key creation yourself, often with a conditional check or the get() method. This approach simplifies the logic by removing that boilerplate code, making your counter implementation cleaner and more direct.

Basic and intermediate defaultdict techniques

While the integer counter is a great start, defaultdict truly shines when you use it with other data types for frequency analysis and grouping related items.

Creating defaultdict with different default types

from collections import defaultdict

int_dict = defaultdict(int) # Default: 0
list_dict = defaultdict(list) # Default: []
str_dict = defaultdict(str) # Default: ''
set_dict = defaultdict(set) # Default: set()

print(f"int: {int_dict['key']}, list: {list_dict['key']}, str: {repr(str_dict['key'])}, set: {set_dict['key']}")--OUTPUT--int: 0, list: [], str: '', set: set()

You can initialize a defaultdict with different "factories" like int, list, str, or set. The factory you choose determines the default value assigned to a key the first time it's accessed. This is incredibly useful for different data aggregation tasks.

  • defaultdict(list) creates an empty list ([]) for new keys, perfect for grouping items.
  • defaultdict(set) creates an empty set (set()) to store unique items per key.
  • defaultdict(str) provides an empty string ('') as the default.

This flexibility lets you build memory-efficient complex data structures without writing extra boilerplate code to handle missing keys.

Using defaultdict for frequency analysis

from collections import defaultdict

words = ["apple", "banana", "apple", "orange", "banana", "apple"]
freq = defaultdict(int)
for word in words:
freq[word] += 1

total = len(words)
percentages = {word: f"{(count/total)*100:.1f}%" for word, count in freq.items()}
print(percentages)--OUTPUT--{'apple': '50.0%', 'banana': '33.3%', 'orange': '16.7%'}

Frequency analysis is a classic use case for defaultdict. As the code loops through the list, freq[word] += 1 tallies each word's count. You don't need any extra logic to handle the first time a word appears, which keeps the loop clean and efficient.

  • The real power here is turning raw counts into useful insights.
  • After counting, a dictionary comprehension transforms the counts into percentages.
  • The final result is a clean dictionary that maps each word to its frequency, formatted as a string.

Grouping data with defaultdict

from collections import defaultdict

names = ["Alice", "Bob", "Charlie", "Andrew", "Barbara", "Carl"]
grouped_names = defaultdict(list)
for name in names:
grouped_names[name[0]].append(name)

print(dict(grouped_names))--OUTPUT--{'A': ['Alice', 'Andrew'], 'B': ['Bob', 'Barbara'], 'C': ['Charlie', 'Carl']}

This example showcases how to group related items into lists. By initializing with defaultdict(list), you can immediately start appending values to keys, even if those keys haven't been seen before. The code iterates through the names and uses the first letter of each name as the dictionary key.

  • The line grouped_names[name[0]].append(name) is the core of the operation.
  • It automatically creates an empty list for a new initial letter and then adds the current name to it.

This pattern is far cleaner than manually checking if a key exists before creating a new list.

Advanced defaultdict techniques

With the fundamentals covered, you can now push defaultdict further by using custom factories, creating nested structures, and combining it with other collections.

Using custom factory functions with defaultdict

from collections import defaultdict

def default_greeting():
return "Hello, stranger!"

greetings = defaultdict(default_greeting)
greetings["Alice"] = "Hello, Alice!"
print(greetings["Alice"])
print(greetings["Bob"])--OUTPUT--Hello, Alice!
Hello, stranger!

You can supply your own custom functions as factories for defaultdict. Here, the default_greeting function is passed as the factory. When the code tries to access a key that doesn't exist, like "Bob", the defaultdict automatically calls the function and uses its return value—"Hello, stranger!"—as the default for that key.

  • This technique lets you define more complex or meaningful default values beyond simple empty types.
  • The factory function must be a callable that accepts no arguments.
  • Existing keys, like "Alice", are unaffected and return their assigned values.

Nested defaultdict structures

from collections import defaultdict

def nested_dict():
return defaultdict(int)

scores = defaultdict(nested_dict)
scores["Alice"]["Math"] = 90
scores["Alice"]["Science"] = 95
scores["Bob"]["Math"] = 85

print(f"Alice Math: {scores['Alice']['Math']}")
print(f"Bob Science: {scores['Bob']['Science']}") # Default 0--OUTPUT--Alice Math: 90
Bob Science: 0

You can create nested dictionaries by providing a factory function that returns another defaultdict. In this example, the outer defaultdict uses the nested_dict function as its factory, which in turn creates a defaultdict(int) for any new key.

  • When you first access a key like scores["Alice"], the nested_dict function is called, automatically creating a new defaultdict(int) as the value.
  • This structure lets you chain key assignments like scores["Alice"]["Math"] = 90 without raising a KeyError.
  • If you access a key that doesn't exist in an inner dictionary, like scores["Bob"]["Science"], it defaults to 0 because the inner factory is int.

Combining defaultdict with other collections

from collections import defaultdict, Counter

text = "the quick brown fox jumps over the lazy dog"
word_positions = defaultdict(Counter)

for pos, word in enumerate(text.split()):
word_positions[word][pos] = 1

print(dict(word_positions["the"]))
print(dict(word_positions["fox"]))--OUTPUT--{0: 1, 6: 1}
{3: 1}

You can pair defaultdict with other powerful tools from the collections module, like Counter. This example builds a simple search index by initializing with defaultdict(Counter), which assigns an empty Counter object to any new word. As the code loops through the text, it populates each word's Counter with its positions.

  • This structure effectively maps each word to the indices where it appears.
  • For instance, the Counter for the word "the" ends up tracking its occurrences at positions 0 and 6.

Move faster with Replit

Replit is an AI-powered development platform where all Python dependencies come pre-installed, so you can skip setup and start coding instantly. Instead of piecing together techniques, you can use Agent 4 to build complete applications directly from a description.

Agent handles everything from writing the code to connecting databases and deploying your app. You can describe the final product you want, and it will build it for you. For example, you could build:

  • A data-grouping utility that organizes a list of sales leads by region.
  • A text analysis tool that counts keyword frequencies from user feedback to build a simple dashboard.
  • A multi-level inventory system that tracks product stock across different warehouses.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

While defaultdict simplifies your code, a few common pitfalls can lead to unexpected bugs if you're not careful.

  • Forgetting the factory callable: A frequent mistake is passing the result of a function call, like list(), instead of the function itself. The factory must be a callable—such as list or a function name—that defaultdict can execute to create a new default value. Providing a static value like an empty list will cause a TypeError.
  • Shared mutable defaults: Be cautious with factories that return the same mutable object. For example, a factory like lambda: my_list gives every new key a reference to the exact same list. When you modify the list for one key, you inadvertently change it for all other keys using that default.
  • Unintentional key creation: Accessing a key with square brackets (d['key']) automatically creates it if it's missing. This can bloat your dictionary if you only meant to check for the key's existence. To avoid this, use the in keyword ('key' in d) to test for presence without triggering the default factory.

Forgetting to pass a callable as the default factory

A frequent mistake is passing a static value like list() instead of the callable list itself. The defaultdict constructor needs a function it can call to create default values, not a pre-made object. This subtle difference leads directly to a TypeError.

from collections import defaultdict

# Attempting to pass an instance of a list, not the factory function
sentences = defaultdict(list())

for word in ["apple", "banana"]:
sentences[word].append("example") # TypeError will be raised

The code raises a TypeError because defaultdict tries to call the argument it receives. Since list() evaluates to an empty list, the code attempts to execute [](), which fails. Let's look at the correct approach.

from collections import defaultdict

# Correctly passing the list type (which is callable)
sentences = defaultdict(list)
# Alternative: sentences = defaultdict(lambda: [])

for word in ["apple", "banana"]:
sentences[word].append("example") # Works correctly

The correct approach is to pass the list type itself as the factory. Because list is a callable, defaultdict can execute it to create a new, empty list each time a missing key is accessed. This ensures every key gets its own unique list, avoiding the TypeError. This mistake is common when you're new to defaultdict and forget its argument must be a function it can call, not a static value.

Accidentally modifying shared mutable default objects

Accidentally modifying shared mutable default objects

A subtle but dangerous bug occurs when your factory returns the same mutable object, like a list, for every new key. When you modify this shared object through one key, you unexpectedly change it for all others. The following code shows this in action.

from collections import defaultdict

# Bug: using a lambda that returns the same list for all keys
data = defaultdict(lambda: [0, 0])
data["A"][0] = 10
data["B"][1] = 20

print(data["A"]) # Unexpected: [10, 20]
print(data["C"]) # Already contains: [10, 20]

The lambda function provides a single list that all new keys share. Modifying the list for key "A" also changes it for "B" and any future keys like "C". The following code demonstrates the correct approach.

from collections import defaultdict

# Fixed: create a new list instance for each key
data = defaultdict(lambda: [0, 0].copy())
data["A"][0] = 10
data["B"][1] = 20

print(data["A"]) # Now as expected: [10, 0]
print(data["C"]) # Unaffected: [0, 0]

The fix is to ensure your factory function creates a new, independent object for each key. By using [0, 0].copy(), the lambda now returns a fresh copy of the list every time a new key is accessed. This isolates changes, so modifying the list for one key doesn't affect any others. You should watch for this issue whenever your factory returns a mutable object, like a list or dict, to prevent unintended side effects.

Unintentionally adding keys when checking values

Unintentionally adding keys when checking values

It's easy to accidentally add new keys to a defaultdict when you only intend to check if they exist. Using square brackets like d['key'] automatically creates the key with its default value, which can bloat your dictionary. See how this happens below.

from collections import defaultdict

word_counts = defaultdict(int)
word_counts["apple"] = 5
word_counts["banana"] = 3

print(word_counts["missing"]) # Adds 'missing' key with value 0
print(dict(word_counts)) # Now includes the unwanted key

Accessing word_counts["missing"] does more than just check for a key—it adds 'missing' to your dictionary with a value of 0. This happens because it triggers the default factory. See the correct approach below.

from collections import defaultdict

word_counts = defaultdict(int)
word_counts["apple"] = 5
word_counts["banana"] = 3

# Check if key exists before accessing
if "missing" in word_counts:
print(word_counts["missing"])
else:
print("Key not found, default would be", int()) # Shows 0

print(dict(word_counts)) # No unwanted keys added

The correct way to check for a key without adding it is to use the in keyword. This lets you test for a key's presence without triggering the default factory, so no unwanted keys are created. Your dictionary remains clean, containing only the keys you've explicitly added. You should always use this method when you only want to read data without accidentally modifying the dictionary's structure.

Real-world applications

With the common pitfalls covered, you're ready to apply defaultdict to real-world challenges like processing log data and analyzing file structures using vibe coding.

Processing log data with defaultdict

Processing log data becomes much simpler when you use defaultdict to group entries by severity, such as separating errors from informational messages.

from collections import defaultdict

# Sample log lines
logs = [
"ERROR: Database connection failed",
"INFO: User login successful",
"ERROR: Authentication failed",
"INFO: File uploaded successfully"
]

# Group logs by severity
log_groups = defaultdict(list)
for log in logs:
severity = log.split(":")[0]
log_groups[severity].append(log)

print(f"ERROR logs count: {len(log_groups['ERROR'])}")
print(f"First ERROR log: {log_groups['ERROR'][0]}")

The code leverages a defaultdict(list) to organize a stream of log data with all Python dependencies pre-installed. This structure automatically provides an empty list for any new key, which simplifies the code significantly.

  • Inside the loop, each log string is parsed to extract its prefix, such as ERROR, which serves as the key.
  • The full log message is then appended to the list for that key using log_groups[severity].append(log).

This approach neatly buckets all related messages together without needing extra conditional logic to handle the first time a key appears.

Analyzing file structure with defaultdict

You can use a defaultdict with a custom factory to analyze file metadata, such as grouping files by extension and calculating their total and average sizes.

from collections import defaultdict

# Create a mapping of file extensions to sizes
file_stats = defaultdict(lambda: {"count": 0, "total_size": 0})

# Sample data (in real usage, this would come from os.walk)
files = [
("document.txt", 10240),
("image.jpg", 153600),
("notes.txt", 5120),
("photo.jpg", 204800)
]

# Analyze files by extension
for filename, size in files:
ext = filename.split(".")[-1] if "." in filename else "no_extension"
file_stats[ext]["count"] += 1
file_stats[ext]["total_size"] += size

for ext, stats in file_stats.items():
avg_size = stats["total_size"] / stats["count"]
print(f"{ext}: {stats['count']} files, {avg_size/1024:.1f} KB average")

This example demonstrates how a lambda function can serve as a powerful factory for defaultdict.

  • The lambda provides a blueprint—a dictionary with count and total_size—for each new file extension.
  • This creates a nested structure on the fly, letting you update file_stats[ext]["count"] without manual initialization.

The code efficiently groups files by their extension, simultaneously tracking how many files of each type exist and their combined size. This pattern is perfect for building multi-faceted summaries from raw data without complex setup.

Get started with Replit

Put your defaultdict skills to use by building a real tool. Describe what you want to Replit Agent, like “a script that analyzes a text file and outputs word frequencies” or “a tool that groups sales data by city.”

The Agent writes the code, tests for errors, and deploys your application. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.