XML Processing APIs Across Programming Languages

Different programming languages provide various APIs and libraries for XML processing, each with their own strengths, features, and performance characteristics. This guide covers the most popular and effective XML processing options across major programming languages.

Understanding the available APIs in your chosen language helps you select the right tool for your specific XML processing needs, whether you're dealing with small configuration files or processing large data streams.

Java XML APIs

JAXP (Java API for XML Processing)

// DOM Processing
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("example.xml");

// SAX Processing
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
saxParser.parse("example.xml", new CustomHandler());

// StAX Processing
import javax.xml.stream.*;

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory.createXMLStreamReader(new FileInputStream("example.xml"));

JAXB (Java Architecture for XML Binding)

// Define XML-bound classes
@XmlRootElement(name = "library")
public class Library {
    @XmlElement(name = "book")
    private List<Book> books;
    
    // getters and setters
}

@XmlRootElement(name = "book")
public class Book {
    @XmlAttribute
    private String id;
    
    @XmlElement
    private String title;
    
    @XmlElement
    private String author;
    
    @XmlElement
    private BigDecimal price;
    
    // constructors, getters, setters
}

// Marshalling (Object to XML)
JAXBContext context = JAXBContext.newInstance(Library.class);
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(library, new File("output.xml"));

// Unmarshalling (XML to Object)
Unmarshaller unmarshaller = context.createUnmarshaller();
Library library = (Library) unmarshaller.unmarshal(new File("input.xml"));

XPath in Java

import javax.xml.xpath.*;

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();

// Evaluate XPath expressions
String title = (String) xpath.evaluate("//book[@id='1']/title/text()", document, XPathConstants.STRING);
NodeList expensiveBooks = (NodeList) xpath.evaluate("//book[price > 30]", document, XPathConstants.NODESET);

// Compiled XPath for performance
XPathExpression expr = xpath.compile("//book[author='Jane Doe']");
NodeList result = (NodeList) expr.evaluate(document, XPathConstants.NODESET);

Dom4j (Third-party Library)

import org.dom4j.*;
import org.dom4j.io.SAXReader;

// Reading XML
SAXReader reader = new SAXReader();
Document document = reader.read("example.xml");
Element root = document.getRootElement();

// Navigate elements
List<Element> books = root.elements("book");
for (Element book : books) {
    String id = book.attributeValue("id");
    String title = book.elementText("title");
    String author = book.elementText("author");
}

// XPath with Dom4j
List<Node> nodes = document.selectNodes("//book[@id='1']");
Node titleNode = document.selectSingleNode("//book[@id='1']/title");

Python XML APIs

xml.etree.ElementTree (Built-in)

import xml.etree.ElementTree as ET

# Parsing XML
tree = ET.parse('library.xml')
root = tree.getroot()

# Finding elements
for book in root.findall('book'):
    book_id = book.get('id')
    title = book.find('title').text
    author = book.find('author').text
    price = book.find('price').text
    
    print(f"Book {book_id}: {title} by {author} - ${price}")

# XPath-like operations
expensive_books = root.findall(".//book[price>30]")  # Limited XPath support

# Creating XML
library = ET.Element('library')
book = ET.SubElement(library, 'book', id='1')
title = ET.SubElement(book, 'title')
title.text = 'Learning XML'

# Writing XML
tree = ET.ElementTree(library)
tree.write('output.xml', encoding='utf-8', xml_declaration=True)

lxml (Third-party Library)

from lxml import etree, objectify

# Parsing with lxml
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('library.xml', parser)
root = tree.getroot()

# XPath support
books = root.xpath('//book[@id="1"]')
titles = root.xpath('//book/title/text()')
expensive_books = root.xpath('//book[price > 30]')

# Object-oriented approach with objectify
obj_root = objectify.parse('library.xml').getroot()
for book in obj_root.book:
    print(f"Title: {book.title}")
    print(f"Author: {book.author}")
    print(f"Price: ${book.price}")

# Schema validation
schema_doc = etree.parse('schema.xsd')
schema = etree.XMLSchema(schema_doc)
if schema.validate(tree):
    print("Document is valid")
else:
    print("Validation errors:", schema.error_log)

BeautifulSoup (HTML/XML Parser)

from bs4 import BeautifulSoup

# Parse XML with BeautifulSoup
with open('library.xml', 'r') as file:
    content = file.read()

soup = BeautifulSoup(content, 'xml')

# Find elements
books = soup.find_all('book')
for book in books:
    book_id = book.get('id')
    title = book.find('title').string
    author = book.find('author').string
    price = book.find('price').string
    
    print(f"Book {book_id}: {title} by {author} - ${price}")

# CSS selectors
titles = soup.select('book > title')
expensive_books = soup.find_all('book', lambda tag: 
                                float(tag.find('price').string) > 30 if tag.find('price') else False)

C# XML APIs

System.Xml (Built-in)

using System.Xml;

// XmlDocument (DOM-like)
XmlDocument doc = new XmlDocument();
doc.Load("library.xml");

XmlNodeList books = doc.SelectNodes("//book");
foreach (XmlNode book in books)
{
    string id = book.Attributes["id"].Value;
    string title = book.SelectSingleNode("title").InnerText;
    string author = book.SelectSingleNode("author").InnerText;
    string price = book.SelectSingleNode("price").InnerText;
    
    Console.WriteLine($"Book {id}: {title} by {author} - ${price}");
}

// XmlReader (Forward-only)
using (XmlReader reader = XmlReader.Create("library.xml"))
{
    while (reader.Read())
    {
        if (reader.NodeType == XmlNodeType.Element && reader.Name == "book")
        {
            string id = reader.GetAttribute("id");
            // Process book element
        }
    }
}

// XPath
XmlNode titleNode = doc.SelectSingleNode("//book[@id='1']/title");
XmlNodeList expensiveBooks = doc.SelectNodes("//book[price > 30]");

LINQ to XML

using System.Xml.Linq;

// Load and parse XML
XDocument doc = XDocument.Load("library.xml");

// LINQ queries
var books = from book in doc.Descendants("book")
           select new
           {
               Id = book.Attribute("id").Value,
               Title = book.Element("title").Value,
               Author = book.Element("author").Value,
               Price = decimal.Parse(book.Element("price").Value)
           };

foreach (var book in books)
{
    Console.WriteLine($"Book {book.Id}: {book.Title} by {book.Author} - ${book.Price}");
}

// Filtering with LINQ
var expensiveBooks = from book in doc.Descendants("book")
                    where decimal.Parse(book.Element("price").Value) > 30
                    select book;

// Creating XML
XDocument newDoc = new XDocument(
    new XElement("library",
        new XElement("book", new XAttribute("id", "1"),
            new XElement("title", "Learning XML"),
            new XElement("author", "Jane Doe"),
            new XElement("price", "29.99")
        )
    )
);

newDoc.Save("output.xml");

XML Serialization

using System.Xml.Serialization;

[XmlRoot("library")]
public class Library
{
    [XmlElement("book")]
    public List<Book> Books { get; set; }
}

public class Book
{
    [XmlAttribute("id")]
    public string Id { get; set; }
    
    [XmlElement("title")]
    public string Title { get; set; }
    
    [XmlElement("author")]
    public string Author { get; set; }
    
    [XmlElement("price")]
    public decimal Price { get; set; }
}

// Serialization
XmlSerializer serializer = new XmlSerializer(typeof(Library));
using (FileStream fs = new FileStream("output.xml", FileMode.Create))
{
    serializer.Serialize(fs, library);
}

// Deserialization
using (FileStream fs = new FileStream("input.xml", FileMode.Open))
{
    Library library = (Library)serializer.Deserialize(fs);
}

JavaScript XML APIs

Browser DOM API

// Parse XML string
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');

// Check for parsing errors
const parserError = xmlDoc.getElementsByTagName('parsererror');
if (parserError.length > 0) {
    console.error('XML parsing error');
    return;
}

// Navigate DOM
const books = xmlDoc.getElementsByTagName('book');
for (let i = 0; i < books.length; i++) {
    const book = books[i];
    const id = book.getAttribute('id');
    const title = book.getElementsByTagName('title')[0].textContent;
    const author = book.getElementsByTagName('author')[0].textContent;
    const price = book.getElementsByTagName('price')[0].textContent;
    
    console.log(`Book ${id}: ${title} by ${author} - $${price}`);
}

// XPath in browsers
const xpath = "//book[@id='1']/title/text()";
const result = xmlDoc.evaluate(xpath, xmlDoc, null, XPathResult.STRING_TYPE, null);
const title = result.stringValue;

Node.js XML Libraries

// Using xml2js
const xml2js = require('xml2js');
const fs = require('fs');

// Parse XML to JavaScript object
const parser = new xml2js.Parser();
fs.readFile('library.xml', (err, data) => {
    parser.parseString(data, (err, result) => {
        const books = result.library.book;
        books.forEach(book => {
            console.log(`Book ${book.$.id}: ${book.title[0]} by ${book.author[0]} - $${book.price[0]}`);
        });
    });
});

// Convert JavaScript object to XML
const builder = new xml2js.Builder();
const obj = {
    library: {
        book: [
            {
                $: { id: '1' },
                title: ['Learning XML'],
                author: ['Jane Doe'],
                price: ['29.99']
            }
        ]
    }
};
const xml = builder.buildObject(obj);

// Using fast-xml-parser
const { XMLParser, XMLBuilder } = require('fast-xml-parser');

const parser = new XMLParser({
    ignoreAttributes: false,
    attributeNamePrefix: '@_'
});

const jsonObj = parser.parse(xmlData);
console.log(jsonObj.library.book);

// Building XML
const builder = new XMLBuilder({
    ignoreAttributes: false,
    attributeNamePrefix: '@_'
});
const xmlContent = builder.build(jsonObj);

PHP XML APIs

SimpleXML

// Load and parse XML
$xml = simplexml_load_file('library.xml');

// Access elements
foreach ($xml->book as $book) {
    $id = (string)$book['id'];
    $title = (string)$book->title;
    $author = (string)$book->author;
    $price = (string)$book->price;
    
    echo "Book $id: $title by $author - $$price\n";
}

// XPath
$expensive_books = $xml->xpath('//book[price > 30]');
foreach ($expensive_books as $book) {
    echo "Expensive book: " . $book->title . "\n";
}

// Creating XML
$xml = new SimpleXMLElement('<library></library>');
$book = $xml->addChild('book');
$book->addAttribute('id', '1');
$book->addChild('title', 'Learning XML');
$book->addChild('author', 'Jane Doe');
$book->addChild('price', '29.99');

echo $xml->asXML();

DOMDocument

// Load XML
$dom = new DOMDocument();
$dom->load('library.xml');

// XPath
$xpath = new DOMXPath($dom);
$books = $xpath->query('//book');

foreach ($books as $book) {
    $id = $book->getAttribute('id');
    $title = $xpath->query('title', $book)->item(0)->nodeValue;
    $author = $xpath->query('author', $book)->item(0)->nodeValue;
    $price = $xpath->query('price', $book)->item(0)->nodeValue;
    
    echo "Book $id: $title by $author - $$price\n";
}

// Creating elements
$newBook = $dom->createElement('book');
$newBook->setAttribute('id', '3');

$titleElement = $dom->createElement('title', 'Advanced XML');
$newBook->appendChild($titleElement);

$root = $dom->documentElement;
$root->appendChild($newBook);

echo $dom->saveXML();

Go XML APIs

encoding/xml (Built-in)

package main

import (
    "encoding/xml"
    "fmt"
    "io/ioutil"
)

type Library struct {
    XMLName xml.Name `xml:"library"`
    Books   []Book   `xml:"book"`
}

type Book struct {
    ID     string `xml:"id,attr"`
    Title  string `xml:"title"`
    Author string `xml:"author"`
    Price  string `xml:"price"`
}

func main() {
    // Read XML file
    data, err := ioutil.ReadFile("library.xml")
    if err != nil {
        panic(err)
    }
    
    // Unmarshal XML
    var library Library
    err = xml.Unmarshal(data, &library)
    if err != nil {
        panic(err)
    }
    
    // Process data
    for _, book := range library.Books {
        fmt.Printf("Book %s: %s by %s - $%s\n", 
                   book.ID, book.Title, book.Author, book.Price)
    }
    
    // Marshal to XML
    output, err := xml.MarshalIndent(library, "", "  ")
    if err != nil {
        panic(err)
    }
    
    fmt.Println(string(output))
}

Rust XML APIs

serde-xml-rs

use serde::{Deserialize, Serialize};
use std::fs;

#[derive(Debug, Deserialize, Serialize)]
struct Library {
    #[serde(rename = "book")]
    books: Vec<Book>,
}

#[derive(Debug, Deserialize, Serialize)]
struct Book {
    #[serde(rename = "@id")]
    id: String,
    title: String,
    author: String,
    price: String,
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read XML file
    let xml_content = fs::read_to_string("library.xml")?;
    
    // Deserialize
    let library: Library = serde_xml_rs::from_str(&xml_content)?;
    
    // Process data
    for book in &library.books {
        println!("Book {}: {} by {} - ${}", 
                 book.id, book.title, book.author, book.price);
    }
    
    // Serialize
    let xml_output = serde_xml_rs::to_string(&library)?;
    println!("{}", xml_output);
    
    Ok(())
}

Performance Comparison

Benchmarking Different APIs

public class XMLPerformanceBenchmark {
    public void benchmarkParsers(String filePath, int iterations) {
        // DOM parsing
        long domTime = benchmarkDOM(filePath, iterations);
        
        // SAX parsing  
        long saxTime = benchmarkSAX(filePath, iterations);
        
        // StAX parsing
        long staxTime = benchmarkStAX(filePath, iterations);
        
        System.out.println("Performance Results:");
        System.out.printf("DOM:  %d ms%n", domTime);
        System.out.printf("SAX:  %d ms%n", saxTime);
        System.out.printf("StAX: %d ms%n", staxTime);
    }
    
    private long benchmarkDOM(String filePath, int iterations) {
        long start = System.currentTimeMillis();
        
        for (int i = 0; i < iterations; i++) {
            try {
                DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
                Document doc = builder.parse(filePath);
                // Process document
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        
        return System.currentTimeMillis() - start;
    }
    
    // Similar benchmark methods for SAX and StAX...
}

API Selection Guidelines

Choosing the Right API

Use Case	Recommended API	Reason
Small XML files (< 10MB)	DOM-based APIs	Easy to use, full document access
Large XML files (> 100MB)	SAX/StAX APIs	Memory efficient
Data binding	JAXB, XML Serialization	Type safety, easy mapping
Simple parsing	ElementTree, SimpleXML	Lightweight, easy syntax
Complex queries	XPath-enabled APIs	Powerful querying
High performance	Native streaming APIs	Minimal overhead
Web applications	Browser DOM API	Built-in browser support

Best Practices by Language

Java:

Use JAXP for standard processing
Consider JAXB for object mapping
Use StAX for large file processing

Python:

ElementTree for simple tasks
lxml for complex operations and validation
BeautifulSoup for forgiving parsing

C#:

LINQ to XML for modern applications
XmlDocument for legacy compatibility
XmlReader for memory-constrained scenarios

JavaScript:

Native DOM API in browsers
xml2js or fast-xml-parser in Node.js

Additional Resources

Oracle Java XML Processing

Python XML Processing

Mozilla XML Developer Guide

Language APIs

Java XML APIs

JAXP (Java API for XML Processing)

JAXB (Java Architecture for XML Binding)

XPath in Java

Dom4j (Third-party Library)

Python XML APIs

xml.etree.ElementTree (Built-in)

lxml (Third-party Library)

BeautifulSoup (HTML/XML Parser)

C# XML APIs

System.Xml (Built-in)

LINQ to XML

XML Serialization

JavaScript XML APIs

Browser DOM API

Node.js XML Libraries

PHP XML APIs

SimpleXML

DOMDocument

Go XML APIs

encoding/xml (Built-in)

Rust XML APIs

serde-xml-rs

Performance Comparison

Benchmarking Different APIs

API Selection Guidelines

Choosing the Right API

Best Practices by Language

Additional Resources

On this page

Java XML APIs

Python XML APIs

C# XML APIs

JavaScript XML APIs

PHP XML APIs

Go XML APIs

Rust XML APIs

Performance Comparison

API Selection Guidelines