How to build a knowledge graph in Python
Learn how to build a knowledge graph in Python. Explore different methods, tips, real-world applications, and common debugging techniques.

To construct a knowledge graph in Python is a powerful way to model complex data relationships. This structure helps you uncover insights and connections that are otherwise hidden from view.
In this article, you'll learn the core techniques to build your own graph from the ground up. We'll cover practical tips, explore real world applications, and provide advice to debug your code for a smooth development process.
Building a simple knowledge graph with dictionaries
knowledge_graph = {
"Alice": {"knows": ["Bob", "Charlie"], "likes": ["Pizza", "Coding"]},
"Bob": {"knows": ["Alice"], "works_at": ["TechCorp"]},
"Charlie": {"knows": ["Alice"], "lives_in": ["New York"]}
}
print(knowledge_graph["Alice"]["knows"])--OUTPUT--['Bob', 'Charlie']
A nested dictionary provides a simple yet powerful foundation for a knowledge graph. This approach maps entities and their connections using Python's built-in tools. Here’s how it breaks down:
- The outer keys (e.g.,
"Alice") are the nodes of the graph. - The inner dictionaries define the edges, with keys like
"knows"representing the type of relationship. - The values are lists of connected nodes or attributes.
This structure makes querying straightforward. Accessing dictionary values in Python like knowledge_graph["Alice"]["knows"] lets you instantly retrieve related entities without complex logic, making it a memory-efficient choice for smaller datasets.
Core techniques for knowledge graphs
While dictionaries get you started, libraries like networkx and rdflib provide more robust tools for building and visualizing your knowledge graph with matplotlib.
Using networkx for graph representation
import networkx as nx
G = nx.DiGraph()
G.add_node("Alice", type="Person")
G.add_node("Bob", type="Person")
G.add_edge("Alice", "Bob", relation="knows")
G.add_edge("Bob", "TechCorp", relation="works_at")
print(list(G.edges(data=True)))--OUTPUT--[('Alice', 'Bob', {'relation': 'knows'}), ('Bob', 'TechCorp', {'relation': 'works_at'})]
The networkx library provides a more structured approach than dictionaries. You begin by initializing a directed graph with nx.DiGraph(), where relationships have a specific direction. This object serves as the container for your graph's structure.
- You add entities using
add_node(), which also lets you assign attributes liketype="Person". - Relationships are created with
add_edge(), allowing you to define the connection with properties likerelation="knows".
This method allows for richer, more explicit modeling of your data compared to a simple dictionary, providing advanced techniques for representing graphs in Python.
Adding relationships with rdflib
from rdflib import Graph, Literal, RDF, URIRef
from rdflib.namespace import FOAF
g = Graph()
alice = URIRef("http://example.org/alice")
bob = URIRef("http://example.org/bob")
g.add((alice, RDF.type, FOAF.Person))
g.add((alice, FOAF.knows, bob))
print(g.serialize(format="turtle"))--OUTPUT--@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
<http://example.org/alice> a foaf:Person ;
foaf:knows <http://example.org/bob> .
For more formal knowledge representation, rdflib implements the RDF standard. It structures data in "triples"—a subject, predicate, and object format. You start by creating a Graph(), which holds all your data triples and allows for standardized querying.
- Entities are defined with
URIRef, creating unique web-like identifiers for your data points. - You add relationships using
g.add(), passing a tuple that represents a triple, such as(alice, FOAF.knows, bob). - Vocabularies like
FOAFprovide standardized terms, ensuring your graph is interoperable.
Visualizing knowledge graphs with matplotlib
import networkx as nx
import matplotlib.pyplot as plt
G = nx.DiGraph()
G.add_edges_from([("Alice", "Bob"), ("Alice", "Charlie"), ("Bob", "TechCorp")])
pos = nx.spring_layout(G)
nx.draw(G, pos, with_labels=True, node_color="lightblue", edge_color="gray")
plt.title("Knowledge Graph Visualization")
plt.show()--OUTPUT--[Knowledge graph visualization displayed]
Once your graph is built, matplotlib lets you visualize its structure. This turns abstract connections into an intuitive map you can actually see. The key is to first define the layout of your nodes before drawing them.
- The
nx.spring_layout(G)function arranges nodes using a physics-based algorithm for a clean, readable layout. - With positions set,
nx.draw()renders the graph, letting you customize elements like node labels and colors. - Finally,
plt.show()presents the complete visualization.
Advanced approaches and integrations
Now that you've built a graph, you can take it further by learning to query it efficiently, populate it automatically, and scale it with specialized databases.
Querying knowledge graphs with SPARQL
from rdflib import Graph, Namespace, FOAF
from rdflib.plugins.sparql import prepareQuery
g = Graph()
g.parse("data.ttl", format="turtle")
query = prepareQuery("""
SELECT ?person WHERE { ?person a foaf:Person . }
""", initNs={"foaf": FOAF})
results = g.query(query)
for row in results:
print(f"Found person: {row.person}")--OUTPUT--Found person: http://example.org/alice
Found person: http://example.org/bob
SPARQL is the standard language for querying RDF data, much like SQL is for databases. After loading your graph from a file like data.ttl with g.parse(), you can ask it specific questions.
- You define your query using SPARQL syntax and prepare it with
prepareQuery(). The exampleSELECT ?person WHERE { ?person a foaf:Person . }asks for all entities of the typefoaf:Person. - Executing the search with
g.query()returns an iterable object, letting you loop through the results to find what you need.
Using spaCy for automatic extraction
import spacy
nlp = spacy.load("en_core_web_sm")
text = "Apple was founded by Steve Jobs in California."
doc = nlp(text)
triples = []
for entity in doc.ents:
if entity.label_ == "ORG":
for token in doc:
if token.dep_ == "ROOT" and token.head == token:
triples.append((entity.text, token.lemma_, "?"))
print(triples)--OUTPUT--[('Apple', 'found', '?')]
Manually populating a graph is tedious, so you can use the spaCy library to automatically extract relationships from plain text. This type of rapid prototyping and experimentation is what makes vibe coding so powerful. After loading a language model like en_core_web_sm, you process your text to analyze its grammatical structure and identify key components.
- The code iterates through
doc.entsto find named entities, filtering for specific types like organizations. - It then identifies the main action by finding the token with the
ROOTdependency label. - Finally, it combines these elements to form a basic triple, automating the process of populating your graph.
Integrating with graph databases using neo4j
from neo4j import GraphDatabase
driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password"))
with driver.session() as session:
session.run("CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})")
result = session.run("MATCH (a:Person)-[r:KNOWS]->(b:Person) RETURN a.name, b.name")
for record in result:
print(f"{record['a.name']} knows {record['b.name']}")
driver.close()--OUTPUT--Alice knows Bob
When your knowledge graph becomes too large to manage in memory, a dedicated graph database like Neo4j is the next step. The neo4j library connects your Python application to the database server, allowing you to handle massive datasets efficiently.
- You start by creating a
driverobject to establish the connection to your database. - All database interactions are handled within a
session, where you usesession.run()to execute queries. - These queries use Cypher—Neo4j's query language—to define and find data patterns like
(a:Person)-[:KNOWS]->(b:Person).
Move faster with Replit
Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of piecing together techniques from libraries like networkx and rdflib, you can describe the app you want to build and let Agent 4 take it from an idea to a working product.
- A relationship mapper that uses
spaCyto parse articles and automatically build a knowledge graph of people, places, and organizations. - An interactive dashboard that visualizes a social network from your data, using
networkxandmatplotlibto draw nodes and connections. - A company organization chart tool that connects to a
neo4jdatabase and allows you to query for reporting structures and team relationships.
Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.
Common errors and challenges
Building a knowledge graph often involves navigating common pitfalls, from handling missing data to untangling complex structural issues like cycles.
Handling missing nodes with the .get() method
When using a dictionary, trying to access a node that doesn't exist will raise a KeyError and halt your program. A simple fix is to use the .get() method instead of direct key access. This approach safely checks for a node and returns None if it's missing, which keeps your code running smoothly without unexpected interruptions.
Fixing path finding errors in networkx graphs
Pathfinding functions in networkx can throw errors like NodeNotFound or NetworkXNoPath if a node is missing or no connection exists. You can prevent this by first checking if a node is in the graph with G.has_node(). It's also good practice to wrap pathfinding calls in a try...except block to gracefully handle situations where no route is found.
Detecting cycles in knowledge graphs
Cycles—paths that loop back to their starting point—can trap algorithms in an infinite loop. Fortunately, you don't have to find them manually. The networkx library provides a handy function called nx.simple_cycles() that detects and lists all cyclical paths, so you can manage them before they cause problems in your analysis.
Handling missing nodes with the .get() method
Directly accessing a dictionary key that doesn't exist is a surefire way to trigger a KeyError and crash your program. It's a common misstep when querying a graph for a node that hasn't been added yet. The code below demonstrates this exact scenario.
knowledge_graph = {
"Alice": {"knows": ["Bob", "Charlie"]},
"Bob": {"knows": ["Alice"]}
}
# This will raise a KeyError
connections = knowledge_graph["Dave"]["knows"]
print(f"Dave knows: {connections}")
The program crashes because it tries to look up "Dave", a key that doesn't exist in the dictionary, which triggers a KeyError. Learn more comprehensive approaches for solving KeyError in Python. See how to handle this gracefully in the next snippet.
knowledge_graph = {
"Alice": {"knows": ["Bob", "Charlie"]},
"Bob": {"knows": ["Alice"]}
}
# Using get() with a default value to avoid KeyError
connections = knowledge_graph.get("Dave", {}).get("knows", [])
print(f"Dave knows: {connections}")
The solution is to chain the .get() method. First, knowledge_graph.get("Dave", {}) safely looks for "Dave". Since it isn't found, it returns a default empty dictionary {}. The second .get("knows", []) is then called on that empty dictionary, which returns an empty list []. This technique prevents a KeyError and is crucial when you're not sure if a node or its connections exist in your graph.
Fixing path finding errors in networkx graphs
Finding a path between two nodes is a common task, but it can easily crash your program. If you ask networkx to find a route to a node that doesn't exist, functions like nx.shortest_path() will raise an error. The following snippet shows what happens when you try to find a path to a node that isn't in the graph.
import networkx as nx
G = nx.DiGraph()
G.add_edge("Alice", "Bob", relation="knows")
G.add_edge("Bob", "Charlie", relation="knows")
# This will fail if no path exists
path = nx.shortest_path(G, "Alice", "Dave")
print(f"Path from Alice to Dave: {path}")
Since the node "Dave" was never added to the graph, the call to nx.shortest_path() raises an error. The next snippet shows how you can manage this gracefully to prevent a crash.
import networkx as nx
G = nx.DiGraph()
G.add_edge("Alice", "Bob", relation="knows")
G.add_edge("Bob", "Charlie", relation="knows")
try:
path = nx.shortest_path(G, "Alice", "Dave")
print(f"Path from Alice to Dave: {path}")
except nx.NetworkXNoPath:
print("No path exists between Alice and Dave")
The solution is to wrap the pathfinding logic in a try...except block. This allows you to catch the nx.NetworkXNoPath error that networkx raises when a route between nodes doesn't exist. Instead of crashing, your program can then execute alternative code, like printing a helpful message. It's a robust way to handle queries where you can't guarantee a connection, ensuring your application runs smoothly without unexpected interruptions.
Detecting cycles in knowledge graphs
A cycle is a path that loops back to its starting node. This structure is a classic trap for pathfinding algorithms. A recursive function like the find_path example below can get stuck in this loop, calling itself endlessly until the program crashes.
def find_path(graph, start, end, path=[]):
path = path + [start]
if start == end:
return path
for node in graph.get(start, []):
if node not in path:
new_path = find_path(graph, node, end, path)
if new_path:
return new_path
return None
graph = {"A": ["B"], "B": ["C"], "C": ["A"]}
print(find_path(graph, "A", "C"))
The find_path function traverses a graph where node C points back to A, creating a loop. It's a circular dependency that can cause recursive functions to fail. Check out the next snippet for a simpler approach.
def find_path(graph, start, end, path=None):
if path is None:
path = []
path = path + [start]
if start == end:
return path
for node in graph.get(start, []):
if node not in path:
new_path = find_path(graph, node, end, path)
if new_path:
return new_path
return None
graph = {"A": ["B"], "B": ["C"], "C": ["A"]}
print(find_path(graph, "A", "C"))
The solution corrects a classic Python pitfall involving mutable default arguments.
- Instead of using
path=[]as the default, the function now usespath=None. - A new list is created inside the function, so each recursive call gets its own fresh
path.
This avoids the shared state that trapped the original function in a loop. It’s a crucial fix to remember whenever you use a list or dictionary as a default argument, as it prevents unexpected behavior across function calls.
Real-world applications
Knowledge graphs are more than just theory; they're the backbone of practical tools for movie recommendations and automated question-answering systems. The combination of structured data and AI coding with Python makes these applications particularly powerful.
Building a movie recommendation system with set operations
You can create a simple movie recommendation system by using Python’s set operations to compare user preferences and find new content.
# Simple movie recommendation system
movie_preferences = {
"User1": ["Matrix", "Inception"],
"User2": ["Inception", "Dark Knight"],
"User3": ["Matrix", "Dark Knight"]
}
def get_recommendations(user):
recommendations = []
user_movies = set(movie_preferences[user])
for other_user, movies in movie_preferences.items():
if other_user != user and set(movies) & user_movies:
recommendations.extend([m for m in movies if m not in user_movies])
return list(set(recommendations))
print(get_recommendations("User1"))
The get_recommendations function suggests new movies by finding users with similar tastes. It operates on a dictionary of movie preferences, identifying recommendations based on shared interests through creating sets in Python.
- It first converts the target user's movie list into a set for fast lookups.
- The function then iterates through other users, using the set intersection operator (
&) to find anyone who shares at least one movie preference. - If an overlap is found, it adds movies the other user likes—but the target user hasn't seen—to a list of recommendations.
Finally, it removes any duplicates from the suggestions before returning the final list.
Implementing a simple question-answering system with knowledge graphs
A knowledge graph can also act as a simple brain for a question-answering system, allowing you to parse questions and retrieve specific facts.
# Simple question-answering with knowledge graphs
knowledge_base = {
"Tesla": {"invented": ["AC Motor", "Tesla Coil"]},
"Einstein": {"developed": ["Theory of Relativity"], "born_in": "Germany"}
}
def answer(question):
if "who invented" in question:
item = question.split("who invented")[-1].strip("? ")
for person, facts in knowledge_base.items():
if "invented" in facts and item in facts["invented"]:
return f"{person} invented {item}."
return "I don't know."
print(answer("Who invented AC Motor?"))
The answer() function demonstrates a basic way to query the knowledge_base. It's built to handle a specific question format—"Who invented...?"—and relies on simple string manipulation to understand the query.
- First, it isolates the invention by splitting the question string and cleaning the result with
.strip(). - It then iterates through the dictionary, checking each person's
inventedlist for a match.
If a match is found, the function returns a formatted answer. If not, it returns a default message.
Get started with Replit
Now, turn your knowledge into a real tool. Describe what you want to Replit Agent, like “a tool that maps relationships in an article” or “a dashboard that visualizes a social network.”
Replit Agent writes the code, tests for errors, and deploys your app. Start building with Replit.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.
Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.



