How to read an XML file in Python

Learn how to read XML files in Python with our guide. We cover various methods, tips, real-world applications, and common error debugging.

How to read an XML file in Python
Published on: 
Fri
Feb 20, 2026
Updated on: 
Mon
Apr 6, 2026
The Replit Team

XML files are a standard for data exchange and configuration. To work with them, Python provides robust libraries that help you parse and extract information efficiently for various software applications.

In this article, you'll explore several techniques to read XML files. You'll find practical tips, see real-world applications, and get advice to debug common issues you might face.

Basic parsing with ElementTree

import xml.etree.ElementTree as ET

tree = ET.parse('sample.xml')
root = tree.getroot()
for child in root:
   print(f"{child.tag}: {child.text}")--OUTPUT--item: First item
item: Second item

Python's built-in xml.etree.ElementTree module offers a direct way to handle XML. The ET.parse() function reads your file and converts it into a tree-like object in memory, which you can then navigate programmatically, similar to how you approach reading CSV files with structured data parsing.

By calling tree.getroot(), you access the document's top-level element. The code then iterates through each direct child of this root. For each child, it prints the element's tag with child.tag and its text content with child.text. This approach is great for quickly extracting simple data from the first level of your XML structure.

Common XML parsing approaches

While ElementTree is a solid starting point, Python’s standard library and third-party packages offer more specialized tools for different XML parsing scenarios.

Using xml.dom.minidom for DOM parsing

from xml.dom import minidom

xmldoc = minidom.parse('sample.xml')
items = xmldoc.getElementsByTagName('item')
for item in items:
   print(item.firstChild.data)--OUTPUT--First item
Second item

The xml.dom.minidom module implements the Document Object Model (DOM) standard. This approach loads the entire XML file into memory, letting you navigate the full tree structure.

  • The getElementsByTagName() method is particularly useful. It finds all elements with a specific tag, like 'item', no matter how deeply they're nested.
  • Once you have an element, you can access its contents by navigating its child nodes, such as with item.firstChild.data to get the text.

Using xml.sax for event-based parsing

import xml.sax

class XMLHandler(xml.sax.ContentHandler):
   def characters(self, content): self.content = content
   def endElement(self, tag):
       if tag == "item": print(self.content)

parser = xml.sax.make_parser()
parser.setContentHandler(XMLHandler())
parser.parse('sample.xml')--OUTPUT--First item
Second item

The xml.sax module offers a memory-efficient, event-driven approach. It reads the file sequentially, triggering events as it encounters elements rather than loading the entire document at once. This makes it ideal for parsing very large files, similar to how iterating through lists processes elements one by one.

  • You define a custom ContentHandler to tell the parser what to do for each event.
  • In this code, the characters() method captures text content. The endElement() method then acts when it finds a closing </item> tag, printing the content it just saved.

Reading XML with the lxml library

from lxml import etree

tree = etree.parse('sample.xml')
root = tree.getroot()
for element in root.findall('.//item'):
   print(element.text)--OUTPUT--First item
Second item

The lxml library is a popular third-party alternative that combines the speed of C libraries with a Pythonic interface. It’s often faster and more feature-rich than the standard library's ElementTree, making it a go-to for performance-critical tasks.

  • The code uses findall('.//item') to locate elements. This is an XPath expression where .// tells the parser to find all item elements anywhere in the tree, not just direct children of the root.

This powerful feature makes lxml excellent for navigating complex XML structures with deeply nested data.

Advanced XML processing techniques

Beyond the basics, Python offers powerful tools for tackling more demanding XML challenges, such as processing huge files, running complex queries, and ensuring data integrity. AI coding with Python can help automate these complex parsing workflows.

Working with large XML files using iterparse

import xml.etree.ElementTree as ET

context = ET.iterparse('large_sample.xml', events=('end',))
for event, elem in context:
   if elem.tag == 'item':
       print(elem.text)
       elem.clear()  # Free memory--OUTPUT--First item
Second item
...

When you're working with massive XML files, loading the entire document into memory isn't practical. The iterparse() function offers a memory-efficient solution by parsing the file incrementally, letting you process elements as they're read without consuming all your RAM.

  • By setting events=('end',), you instruct the parser to act on elements only after they are fully read, giving you access to their content.
  • The key to managing memory is the elem.clear() call. It discards the element after you're done, which keeps memory usage low and prevents your application from slowing down or crashing.

Using XPath queries to navigate XML

from lxml import etree

tree = etree.parse('sample.xml')
# Find all items with specific text
items = tree.xpath('//item[contains(text(), "Second")]')
for item in items:
   print(f"Found matching item: {item.text}")--OUTPUT--Found matching item: Second item

XPath is a powerful language for selecting specific nodes from an XML document. With lxml, you can use the xpath() method to run these queries, giving you far more flexibility than just searching by tag name.

  • The expression '//item' tells the parser to find all item elements, no matter where they are in the document tree.
  • The predicate [contains(text(), "Second")] acts as a filter, selecting only those elements whose text content includes the word "Second".

This approach is incredibly efficient for pinpointing exact data within complex or deeply nested XML structures.

Validating XML against a schema

from lxml import etree

schema_root = etree.parse('schema.xsd')
schema = etree.XMLSchema(schema_root)
parser = etree.XMLParser(schema=schema)
try:
   tree = etree.parse('sample.xml', parser)
   print("XML is valid according to the schema")
except etree.XMLSyntaxError as e:
   print(f"XML validation error: {e}")--OUTPUT--XML is valid according to the schema

Validating your XML against a schema ensures its structure and content are correct, which is vital for data integrity. A schema, often an XSD file, acts as a blueprint for your XML document. This process uses lxml to check if your file follows the rules you've defined.

  • First, you load the schema and create a validator object with etree.XMLSchema().
  • You then create a special parser using etree.XMLParser(schema=schema) that incorporates these rules.
  • When etree.parse() runs with this parser, it automatically validates the XML. If the file doesn't conform, it raises an etree.XMLSyntaxError.

Move faster with Replit

Replit is an AI-powered development platform that comes with all Python dependencies pre-installed, so you can skip setup and start coding instantly. Instead of just learning techniques, you can use Agent 4 to build a complete application from a simple description, handling everything from code and databases to APIs and deployment.

Instead of piecing together XML parsing methods, you can describe the app you want to build and let the Agent take it from concept to a working product:

  • A configuration utility that parses an XML settings file and extracts specific values, like database credentials or API keys, based on user queries.
  • A data migration tool that reads a large XML export from an old system, using an incremental parser like iterparse, and converts the records into a new format.
  • A content dashboard that pulls product information from an XML feed, validates it against a schema, and displays the items and their attributes in a clean interface.

Simply describe your app, and Replit will write the code, test it, and fix issues automatically, all within your browser.

Common errors and challenges

Even with the right tools, you might encounter issues like missing elements, namespace conflicts, or encoding errors when parsing XML files.

  • Handling missing elements: When a method like find() can't locate an element, it returns None. If you try to use this result, your script will raise an AttributeError. To avoid this, always check if the result is None before you try to access its attributes or text, ensuring your code handles missing data gracefully.
  • Resolving namespace issues: Namespaces can make element selection tricky because they change an element's official name. If a simple tag search fails, it’s likely because of a namespace. You'll need to define a dictionary that maps a prefix to the namespace URI and then use that prefix in your search queries to target elements correctly.
  • Fixing encoding problems: A UnicodeDecodeError usually signals a mismatch between the file's encoding and the one your parser is using. Most XML files declare their encoding in the first line. If your parser doesn't pick it up automatically, you may need to specify it when opening the file to prevent errors.

Handling missing elements with find() method

When the find() method fails to locate an element, it returns None. Attempting to access an attribute like .text on this result will trigger an AttributeError, crashing your script. This is a common issue with inconsistent XML data. The code below shows this error in action.

import xml.etree.ElementTree as ET

tree = ET.parse('users.xml')
root = tree.getroot()
for user in root.findall('user'):
   name = user.find('name').text
   email = user.find('email').text
   print(f"User: {name}, Email: {email}")

The script crashes if a user element is missing an email tag, as user.find('email') returns None. Attempting to access .text on this result causes the error. The following code demonstrates a safe way to proceed.

import xml.etree.ElementTree as ET

tree = ET.parse('users.xml')
root = tree.getroot()
for user in root.findall('user'):
   name = user.find('name')
   email = user.find('email')
   name_text = name.text if name is not None else "Unknown"
   email_text = email.text if email is not None else "No email"
   print(f"User: {name_text}, Email: {email_text}")

The corrected code prevents crashes by first checking if the element returned by find() exists. It uses a conditional expression—name.text if name is not None else "Unknown"—to safely access the .text attribute only when the element isn't None.

If the element is missing, it assigns a default string instead. This defensive check is crucial when parsing XML from sources where data consistency isn't guaranteed, preventing unexpected errors in your application.

Resolving namespace issues in XML parsing

Namespaces in XML prevent naming conflicts but can complicate parsing. When an element belongs to a namespace, a simple search like findall('element') won't work because the tag’s name is technically different. The following code demonstrates this common pitfall.

import xml.etree.ElementTree as ET

tree = ET.parse('namespace_data.xml')
root = tree.getroot()
elements = root.findall('element')
for elem in elements:
   print(elem.text)

This script produces no output because findall('element') returns an empty list. The parser doesn't recognize the elements without their namespace. The following code shows how to correctly target them.

import xml.etree.ElementTree as ET

tree = ET.parse('namespace_data.xml')
root = tree.getroot()
ns = {'ns': 'http://example.org/namespace'}
elements = root.findall('.//{http://example.org/namespace}element')
# Or alternatively:
# elements = root.findall('.//ns:element', ns)
for elem in elements:
   print(elem.text)

The corrected code works because it tells the parser how to handle the namespace. You can either include the full namespace URI directly in your query, like '{http://example.org/namespace}element', or define a namespace map. Using a map with a prefix, such as 'ns:element', often makes your queries cleaner. Understanding creating dictionaries is essential for building these namespace mappings effectively. This issue is common when you're parsing XML from web services or industry-standard formats, so always check for a namespace declaration at the top of the file.

Fixing encoding problems when parsing XML files

A UnicodeDecodeError is a classic sign that your parser is reading a file with the wrong text encoding. XML files typically declare their encoding, but if it’s not detected, the parser defaults to UTF-8, which can’t handle certain characters.

The code below triggers this error when trying to parse a file containing special characters without the correct encoding specified.

import xml.etree.ElementTree as ET

tree = ET.parse('special_chars.xml')
root = tree.getroot()
for elem in root:
   print(elem.text)

The ET.parse() function fails because its default encoding can't process the file's characters, triggering the error. The corrected code below shows how to properly instruct the parser and prevent this crash.

import xml.etree.ElementTree as ET

parser = ET.XMLParser(encoding="utf-8")
try:
   tree = ET.parse('special_chars.xml', parser=parser)
   root = tree.getroot()
   for elem in root:
       print(elem.text)
except ET.ParseError as e:
   print(f"XML parsing error: {e}")

The corrected code prevents this crash by explicitly telling the parser which encoding to use. You create a custom parser with ET.XMLParser(encoding="utf-8") and pass it to the ET.parse() function. This ensures the file is read correctly, even with special characters. For more robust error handling, consider handling multiple exceptions that may occur during XML parsing. It's a crucial step when you're dealing with XML from different systems or international sources, where encoding isn't always standard.

Real-world applications

Beyond troubleshooting, these parsing methods are essential for everyday tasks like extracting information from RSS feeds or converting XML data into JSON format. With vibe coding, you can quickly prototype these data transformation utilities.

Extracting RSS news with ElementTree.parse()

The ElementTree.parse() function can also process data directly from a web request, which is ideal for parsing live content like an RSS news feed.

import xml.etree.ElementTree as ET
import urllib.request

url = 'http://feeds.bbci.co.uk/news/rss.xml'
response = urllib.request.urlopen(url)
tree = ET.parse(response)
root = tree.getroot()
for item in root.findall('./channel/item')[:2]:
   print(f"Headline: {item.find('title').text}")

This script shows how to target specific information within an online RSS feed. The key is the findall('./channel/item') method, which uses a path to navigate the XML tree and isolate only the news articles.

  • It targets <item> elements nested inside a <channel> tag.
  • The [:2] slice then limits the loop to just the first two articles, making the operation efficient.

For each article, the code finds the corresponding <title> tag and prints its text content as a headline.

Transforming XML to JSON with findall() and json.dumps()

Another practical application is converting XML to JSON, which you can do by using findall() to structure your data into a list of dictionaries and then passing it to json.dumps() for the final conversion.

import xml.etree.ElementTree as ET
import json

tree = ET.parse('customers.xml')
root = tree.getroot()
customers = []

for customer in root.findall('./customer'):
   customer_data = {
       'id': customer.get('id'),
       'name': customer.find('name').text,
       'email': customer.find('email').text
   }
   customers.append(customer_data)

print(json.dumps(customers, indent=2))

This script systematically extracts customer details from an XML file by looping through each <customer> element found with root.findall(). It then organizes the data into a Python list of dictionaries, making it easy to work with.

  • The customer.get('id') method pulls the value from an element's attribute.
  • The customer.find() method retrieves nested elements like name and email, while .text accesses their content.

Finally, json.dumps() formats this list into a clean, readable string for printing.

Get started with Replit

Now, turn theory into practice. Describe a tool to Replit Agent, like “build a utility to parse an RSS feed and display headlines” or “convert our product XML data into a JSON API.”

The Agent will write the code, test for errors, and deploy your application for you. Start building with Replit.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.

Build your first app today

Describe what you want to build, and Replit Agent writes the code, handles the infrastructure, and ships it live. Go from idea to real product, all in your browser.