How to Parse XML Documents Using Xerces2 in Java Xerces2 Java is the reference implementation for the W3C XML Schema 1.0 and 1.1 specifications. It provides robust support for parsing, validating, and manipulating XML data. This guide demonstrates how to set up Apache Xerces2 and use its Document Object Model (DOM) and Simple API for XML (SAX) parsers in Java. Setting Up Apache Xerces2
To use Xerces2, add the xercesImpl.jar and xml-apis.jar libraries to your Java classpath. Maven Dependency
If you use Maven, add the following dependency to your pom.xml:
Use code with caution. 1. DOM Parsing (Tree-Based)
The DOM parser loads the complete XML file into memory as a tree structure. This approach is best for small to medium XML files where you need to navigate back and forth or modify the data. Sample XML File (books.xml)
<?xml version=“1.0” encoding=“UTF-8”?> Use code with caution. Java Implementation
import org.apache.xerces.parsers.DOMParser; import org.w3c.dom.Document; import org.w3c.dom.Element; import org.w3c.dom.NodeList; import org.xml.sax.SAXException; import java.io.IOException; public class XercesDOMParserExample { public static void main(String[] args) { try { // Instantiate the Xerces DOM Parser DOMParser parser = new DOMParser(); // Parse the XML document parser.parse(“books.xml”); // Get the Document object Document document = parser.getDocument(); // Access elements NodeList bookList = document.getElementsByTagName(“book”); for (int i = 0; i < bookList.getLength(); i++) { Element book = (Element) bookList.item(i); String String title = book.getElementsByTagName(“title”).item(0).getTextContent(); String author = book.getElementsByTagName(“author”).item(0).getTextContent(); System.out.println(“Book ID: ” + id); System.out.println(“Title: ” + title); System.out.println(“Author: ” + author); } } catch (SAXException | IOException e) { e.printStackTrace(); } } } Use code with caution. 2. SAX Parsing (Event-Driven)
The SAX parser reads the XML file sequentially and triggers event callbacks when it encounters tags, attributes, or text. This approach uses minimal memory, making it ideal for massive XML files. Java Implementation
import org.apache.xerces.parsers.SAXParser; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; import java.io.IOException; public class XercesSAXParserExample { public static void main(String[] args) { try { // Instantiate the Xerces SAX Parser SAXParser parser = new SAXParser(); // Define event handlers DefaultHandler handler = new DefaultHandler() { boolean bTitle = false; boolean bAuthor = false; @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { if (qName.equalsIgnoreCase(“book”)) { String System.out.println(“Book ID: ” + id); } else if (qName.equalsIgnoreCase(“title”)) { bTitle = true; } else if (qName.equalsIgnoreCase(“author”)) { bAuthor = true; } } @Override public void characters(char[] ch, int start, int length) throws SAXException { if (bTitle) { System.out.println(“Title: ” + new String(ch, start, length)); bTitle = false; } else if (bAuthor) { System.out.println(“Author: ” + new String(ch, start, length)); bAuthor = false; } } }; // Register handler and parse parser.setContentHandler(handler); parser.parse(“books.xml”); } catch (SAXException | IOException e) { e.printStackTrace(); } } } Use code with caution. 3. Enforcing XML Schema (XSD) Validation
Xerces2 excels at validating XML files against a schema during the parsing process. You can configure validation features directly on the parser instance.
import org.apache.xerces.parsers.DOMParser; import org.xml.sax.SAXException; import java.io.IOException; public class XercesValidationExample { public static void main(String[] args) { try { DOMParser parser = new DOMParser(); // Enable validation features parser.setFeature(”http://xml.org”, true); parser.setFeature(”http://apache.org”, true); // Set the external schema location property parser.setProperty( “http://apache.org”, “books.xsd” ); // Parse and validate simultaneously parser.parse(“books.xml”); System.out.println(“XML Document is valid.”); } catch (SAXException | IOException e) { System.out.println(“Validation failed: ” + e.getMessage()); } } } Use code with caution. Choosing Between DOM and SAX in Xerces2
Use DOM if: You need to modify the XML file structure, sort elements, or random-access specific nodes multiple times.
Use SAX if: Your XML files exceed available system memory, or you only need to read the data sequentially a single time.
If you are interested, I can provide additional details on how to handle parsing errors with a custom ErrorHandler or show you how to write/serialize a DOM tree back to an XML file. Which
Leave a Reply