How to build a knowledge graph in Python

Learn how to build a knowledge graph in Python. Explore different methods, tips, real-world applications, and common debugging techniques.

Published on:

Tue

Feb 24, 2026

Updated on:

Mon

Apr 6, 2026

The Replit Team

ON THIS PAGE

Example H2

To construct a knowledge graph in Python is a powerful way to model complex data relationships. This structure helps you uncover insights and connections that are otherwise hidden from view.

In this article, you'll learn the core techniques to build your own graph from the ground up. We'll cover practical tips, explore real world applications, and provide advice to debug your code for a smooth development process.

Building a simple knowledge graph with dictionaries

knowledge_graph = { "Alice": {"knows": ["Bob", "Charlie"], "likes": ["Pizza", "Coding"]}, "Bob": {"knows": ["Alice"], "works_at": ["TechCorp"]}, "Charlie": {"knows": ["Alice"], "lives_in": ["New York"]} } print(knowledge_graph["Alice"]["knows"])--OUTPUT--['Bob', 'Charlie']

A nested dictionary provides a simple yet powerful foundation for a knowledge graph. This approach maps entities and their connections using Python's built-in tools. Here’s how it breaks down:

The outer keys (e.g., "Alice") are the nodes of the graph.
The inner dictionaries define the edges, with keys like "knows" representing the type of relationship.
The values are lists of connected nodes or attributes.

This structure makes querying straightforward. Accessing dictionary values in Python like knowledge_graph["Alice"]["knows"] lets you instantly retrieve related entities without complex logic, making it a memory-efficient choice for smaller datasets.

Core techniques for knowledge graphs

While dictionaries get you started, libraries like networkx and rdflib provide more robust tools for building and visualizing your knowledge graph with matplotlib.

Using `networkx` for graph representation

import networkx as nx G = nx.DiGraph() G.add_node("Alice", type="Person") G.add_node("Bob", type="Person") G.add_edge("Alice", "Bob", relation="knows") G.add_edge("Bob", "TechCorp", relation="works_at") print(list(G.edges(data=True)))--OUTPUT--[('Alice', 'Bob', {'relation': 'knows'}), ('Bob', 'TechCorp', {'relation': 'works_at'})]

The networkx library provides a more structured approach than dictionaries. You begin by initializing a directed graph with nx.DiGraph(), where relationships have a specific direction. This object serves as the container for your graph's structure.

You add entities using add_node(), which also lets you assign attributes like type="Person".
Relationships are created with add_edge(), allowing you to define the connection with properties like relation="knows".

This method allows for richer, more explicit modeling of your data compared to a simple dictionary, providing advanced techniques for representing graphs in Python.

Adding relationships with `rdflib`

from rdflib import Graph, Literal, RDF, URIRef from rdflib.namespace import FOAF g = Graph() alice = URIRef("http://example.org/alice") bob = URIRef("http://example.org/bob") g.add((alice, RDF.type, FOAF.Person)) g.add((alice, FOAF.knows, bob)) print(g.serialize(format="turtle"))--OUTPUT--@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <http://example.org/alice> a foaf:Person ; foaf:knows <http://example.org/bob> .

For more formal knowledge representation, rdflib implements the RDF standard. It structures data in "triples"—a subject, predicate, and object format. You start by creating a Graph(), which holds all your data triples and allows for standardized querying.

Entities are defined with URIRef, creating unique web-like identifiers for your data points.
You add relationships using g.add(), passing a tuple that represents a triple, such as (alice, FOAF.knows, bob).
Vocabularies like FOAF provide standardized terms, ensuring your graph is interoperable.

Visualizing knowledge graphs with `matplotlib`

import networkx as nx import matplotlib.pyplot as plt G = nx.DiGraph() G.add_edges_from([("Alice", "Bob"), ("Alice", "Charlie"), ("Bob", "TechCorp")]) pos = nx.spring_layout(G) nx.draw(G, pos, with_labels=True, node_color="lightblue", edge_color="gray") plt.title("Knowledge Graph Visualization") plt.show()--OUTPUT--[Knowledge graph visualization displayed]

Once your graph is built, matplotlib lets you visualize its structure. This turns abstract connections into an intuitive map you can actually see. The key is to first define the layout of your nodes before drawing them.

The nx.spring_layout(G) function arranges nodes using a physics-based algorithm for a clean, readable layout.
With positions set, nx.draw() renders the graph, letting you customize elements like node labels and colors.
Finally, plt.show() presents the complete visualization.

Advanced approaches and integrations

Now that you've built a graph, you can take it further by learning to query it efficiently, populate it automatically, and scale it with specialized databases.

Querying knowledge graphs with SPARQL

from rdflib import Graph, Namespace, FOAF from rdflib.plugins.sparql import prepareQuery g = Graph() g.parse("data.ttl", format="turtle") query = prepareQuery(""" SELECT ?person WHERE { ?person a foaf:Person . } """, initNs={"foaf": FOAF}) results = g.query(query) for row in results: print(f"Found person: {row.person}")--OUTPUT--Found person: http://example.org/alice Found person: http://example.org/bob

SPARQL is the standard language for querying RDF data, much like SQL is for databases. After loading your graph from a file like data.ttl with g.parse(), you can ask it specific questions.

You define your query using SPARQL syntax and prepare it with prepareQuery(). The example SELECT ?person WHERE { ?person a foaf:Person . } asks for all entities of the type foaf:Person.
Executing the search with g.query() returns an iterable object, letting you loop through the results to find what you need.

Using `spaCy` for automatic extraction

import spacy nlp = spacy.load("en_core_web_sm") text = "Apple was founded by Steve Jobs in California." doc = nlp(text) triples = [] for entity in doc.ents: if entity.label_ == "ORG": for token in doc: if token.dep_ == "ROOT" and token.head == token: triples.append((entity.text, token.lemma_, "?")) print(triples)--OUTPUT--[('Apple', 'found', '?')]

Manually populating a graph is tedious, so you can use the spaCy library to automatically extract relationships from plain text. This type of rapid prototyping and experimentation is what makes vibe coding so powerful. After loading a language model like en_core_web_sm, you process your text to analyze its grammatical structure and identify key components.

The code iterates through doc.ents to find named entities, filtering for specific types like organizations.
It then identifies the main action by finding the token with the ROOT dependency label.
Finally, it combines these elements to form a basic triple, automating the process of populating your graph.

Integrating with graph databases using `neo4j`

from neo4j import GraphDatabase driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "password")) with driver.session() as session: session.run("CREATE (a:Person {name: 'Alice'})-[:KNOWS]->(b:Person {name: 'Bob'})") result = session.run("MATCH (a:Person)-[r:KNOWS]->(b:Person) RETURN a.name, b.name") for record in result: print(f"{record['a.name']} knows {record['b.name']}") driver.close()--OUTPUT--Alice knows Bob

When your knowledge graph becomes too large to manage in memory, a dedicated graph database like Neo4j is the next step. The neo4j library connects your Python application to the database server, allowing you to handle massive datasets efficiently.

You start by creating a driver object to establish the connection to your database.
All database interactions are handled within a session, where you use session.run() to execute queries.
These queries use Cypher—Neo4j's query language—to define and find data patterns like (a:Person)-[:KNOWS]->(b:Person).

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of piecing together techniques from libraries like networkx and rdflib, you can describe the app you want to build and let Agent 4 take it from an idea to a working product.

A relationship mapper that uses spaCy to parse articles and automatically build a knowledge graph of people, places, and organizations.
An interactive dashboard that visualizes a social network from your data, using networkx and matplotlib to draw nodes and connections.
A company organization chart tool that connects to a neo4j database and allows you to query for reporting structures and team relationships.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Building a knowledge graph often involves navigating common pitfalls, from handling missing data to untangling complex structural issues like cycles.

Handling missing nodes with the `.get()` method

When using a dictionary, trying to access a node that doesn't exist will raise a KeyError and halt your program. A simple fix is to use the .get() method instead of direct key access. This approach safely checks for a node and returns None if it's missing, which keeps your code running smoothly without unexpected interruptions.

Fixing path finding errors in `networkx` graphs

Pathfinding functions in networkx can throw errors like NodeNotFound or NetworkXNoPath if a node is missing or no connection exists. You can prevent this by first checking if a node is in the graph with G.has_node(). It's also good practice to wrap pathfinding calls in a try...except block to gracefully handle situations where no route is found.

Detecting cycles in knowledge graphs

Cycles—paths that loop back to their starting point—can trap algorithms in an infinite loop. Fortunately, you don't have to find them manually. The networkx library provides a handy function called nx.simple_cycles() that detects and lists all cyclical paths, so you can manage them before they cause problems in your analysis.

Handling missing nodes with the `.get()` method

Directly accessing a dictionary key that doesn't exist is a surefire way to trigger a KeyError and crash your program. It's a common misstep when querying a graph for a node that hasn't been added yet. The code below demonstrates this exact scenario.

knowledge_graph = { "Alice": {"knows": ["Bob", "Charlie"]}, "Bob": {"knows": ["Alice"]} } # This will raise a KeyError connections = knowledge_graph["Dave"]["knows"] print(f"Dave knows: {connections}")

The program crashes because it tries to look up "Dave", a key that doesn't exist in the dictionary, which triggers a KeyError. Learn more comprehensive approaches for solving KeyError in Python. See how to handle this gracefully in the next snippet.

knowledge_graph = { "Alice": {"knows": ["Bob", "Charlie"]}, "Bob": {"knows": ["Alice"]} } # Using get() with a default value to avoid KeyError connections = knowledge_graph.get("Dave", {}).get("knows", []) print(f"Dave knows: {connections}")

The solution is to chain the .get() method. First, knowledge_graph.get("Dave", {}) safely looks for "Dave". Since it isn't found, it returns a default empty dictionary {}. The second .get("knows", []) is then called on that empty dictionary, which returns an empty list []. This technique prevents a KeyError and is crucial when you're not sure if a node or its connections exist in your graph.

Fixing path finding errors in `networkx` graphs

Finding a path between two nodes is a common task, but it can easily crash your program. If you ask networkx to find a route to a node that doesn't exist, functions like nx.shortest_path() will raise an error. The following snippet shows what happens when you try to find a path to a node that isn't in the graph.

import networkx as nx G = nx.DiGraph() G.add_edge("Alice", "Bob", relation="knows") G.add_edge("Bob", "Charlie", relation="knows") # This will fail if no path exists path = nx.shortest_path(G, "Alice", "Dave") print(f"Path from Alice to Dave: {path}")

Since the node "Dave" was never added to the graph, the call to nx.shortest_path() raises an error. The next snippet shows how you can manage this gracefully to prevent a crash.

import networkx as nx G = nx.DiGraph() G.add_edge("Alice", "Bob", relation="knows") G.add_edge("Bob", "Charlie", relation="knows") try: path = nx.shortest_path(G, "Alice", "Dave") print(f"Path from Alice to Dave: {path}") except nx.NetworkXNoPath: print("No path exists between Alice and Dave")

The solution is to wrap the pathfinding logic in a try...except block. This allows you to catch the nx.NetworkXNoPath error that networkx raises when a route between nodes doesn't exist. Instead of crashing, your program can then execute alternative code, like printing a helpful message. It's a robust way to handle queries where you can't guarantee a connection, ensuring your application runs smoothly without unexpected interruptions.

Detecting cycles in knowledge graphs

A cycle is a path that loops back to its starting node. This structure is a classic trap for pathfinding algorithms. A recursive function like the find_path example below can get stuck in this loop, calling itself endlessly until the program crashes.

def find_path(graph, start, end, path=[]): path = path + [start] if start == end: return path for node in graph.get(start, []): if node not in path: new_path = find_path(graph, node, end, path) if new_path: return new_path return None graph = {"A": ["B"], "B": ["C"], "C": ["A"]} print(find_path(graph, "A", "C"))

The find_path function traverses a graph where node C points back to A, creating a loop. It's a circular dependency that can cause recursive functions to fail. Check out the next snippet for a simpler approach.

def find_path(graph, start, end, path=None): if path is None: path = [] path = path + [start] if start == end: return path for node in graph.get(start, []): if node not in path: new_path = find_path(graph, node, end, path) if new_path: return new_path return None graph = {"A": ["B"], "B": ["C"], "C": ["A"]} print(find_path(graph, "A", "C"))

The solution corrects a classic Python pitfall involving mutable default arguments.

Instead of using path=[] as the default, the function now uses path=None.
A new list is created inside the function, so each recursive call gets its own fresh path.

This avoids the shared state that trapped the original function in a loop. It’s a crucial fix to remember whenever you use a list or dictionary as a default argument, as it prevents unexpected behavior across function calls.

Real-world applications

Knowledge graphs are more than just theory; they're the backbone of practical tools for movie recommendations and automated question-answering systems. The combination of structured data and AI coding with Python makes these applications particularly powerful.

Building a movie recommendation system with `set` operations

You can create a simple movie recommendation system by using Python’s set operations to compare user preferences and find new content.

# Simple movie recommendation system movie_preferences = { "User1": ["Matrix", "Inception"], "User2": ["Inception", "Dark Knight"], "User3": ["Matrix", "Dark Knight"] } def get_recommendations(user): recommendations = [] user_movies = set(movie_preferences[user]) for other_user, movies in movie_preferences.items(): if other_user != user and set(movies) & user_movies: recommendations.extend([m for m in movies if m not in user_movies]) return list(set(recommendations)) print(get_recommendations("User1"))

The get_recommendations function suggests new movies by finding users with similar tastes. It operates on a dictionary of movie preferences, identifying recommendations based on shared interests through creating sets in Python.

It first converts the target user's movie list into a set for fast lookups.
The function then iterates through other users, using the set intersection operator (&) to find anyone who shares at least one movie preference.
If an overlap is found, it adds movies the other user likes—but the target user hasn't seen—to a list of recommendations.

Finally, it removes any duplicates from the suggestions before returning the final list.

Implementing a simple question-answering system with knowledge graphs

A knowledge graph can also act as a simple brain for a question-answering system, allowing you to parse questions and retrieve specific facts.

# Simple question-answering with knowledge graphs knowledge_base = { "Tesla": {"invented": ["AC Motor", "Tesla Coil"]}, "Einstein": {"developed": ["Theory of Relativity"], "born_in": "Germany"} } def answer(question): if "who invented" in question: item = question.split("who invented")[-1].strip("? ") for person, facts in knowledge_base.items(): if "invented" in facts and item in facts["invented"]: return f"{person} invented {item}." return "I don't know." print(answer("Who invented AC Motor?"))

The answer() function demonstrates a basic way to query the knowledge_base. It's built to handle a specific question format—"Who invented...?"—and relies on simple string manipulation to understand the query.

First, it isolates the invention by splitting the question string and cleaning the result with .strip().
It then iterates through the dictionary, checking each person's invented list for a match.

If a match is found, the function returns a formatted answer. If not, it returns a default message.

Get started with Replit

Now, turn your knowledge into a real tool. Describe what you want to Replit Agent, like “a tool that maps relationships in an article” or “a dashboard that visualizes a social network.”

Replit Agent writes the code, tests for errors, and deploys your app. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started free

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Get started for free

Follow @Replit