Language APIs
Different programming languages provide various APIs and libraries for XML processing, each with their own strengths, features, and performance characteristics. This guide covers the most popular and effective XML processing options across major programming languages.
Understanding the available APIs in your chosen language helps you select the right tool for your specific XML processing needs, whether you're dealing with small configuration files or processing large data streams.
Java XML APIs
JAXP (Java API for XML Processing)
// DOM Processing
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse("example.xml");
// SAX Processing
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxFactory.newSAXParser();
saxParser.parse("example.xml", new CustomHandler());
// StAX Processing
import javax.xml.stream.*;
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
XMLStreamReader reader = inputFactory.createXMLStreamReader(new FileInputStream("example.xml"));
JAXB (Java Architecture for XML Binding)
// Define XML-bound classes
@XmlRootElement(name = "library")
public class Library {
@XmlElement(name = "book")
private List<Book> books;
// getters and setters
}
@XmlRootElement(name = "book")
public class Book {
@XmlAttribute
private String id;
@XmlElement
private String title;
@XmlElement
private String author;
@XmlElement
private BigDecimal price;
// constructors, getters, setters
}
// Marshalling (Object to XML)
JAXBContext context = JAXBContext.newInstance(Library.class);
Marshaller marshaller = context.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(library, new File("output.xml"));
// Unmarshalling (XML to Object)
Unmarshaller unmarshaller = context.createUnmarshaller();
Library library = (Library) unmarshaller.unmarshal(new File("input.xml"));
XPath in Java
import javax.xml.xpath.*;
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
// Evaluate XPath expressions
String title = (String) xpath.evaluate("//book[@id='1']/title/text()", document, XPathConstants.STRING);
NodeList expensiveBooks = (NodeList) xpath.evaluate("//book[price > 30]", document, XPathConstants.NODESET);
// Compiled XPath for performance
XPathExpression expr = xpath.compile("//book[author='Jane Doe']");
NodeList result = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
Dom4j (Third-party Library)
import org.dom4j.*;
import org.dom4j.io.SAXReader;
// Reading XML
SAXReader reader = new SAXReader();
Document document = reader.read("example.xml");
Element root = document.getRootElement();
// Navigate elements
List<Element> books = root.elements("book");
for (Element book : books) {
String id = book.attributeValue("id");
String title = book.elementText("title");
String author = book.elementText("author");
}
// XPath with Dom4j
List<Node> nodes = document.selectNodes("//book[@id='1']");
Node titleNode = document.selectSingleNode("//book[@id='1']/title");
Python XML APIs
xml.etree.ElementTree (Built-in)
import xml.etree.ElementTree as ET
# Parsing XML
tree = ET.parse('library.xml')
root = tree.getroot()
# Finding elements
for book in root.findall('book'):
book_id = book.get('id')
title = book.find('title').text
author = book.find('author').text
price = book.find('price').text
print(f"Book {book_id}: {title} by {author} - ${price}")
# XPath-like operations
expensive_books = root.findall(".//book[price>30]") # Limited XPath support
# Creating XML
library = ET.Element('library')
book = ET.SubElement(library, 'book', id='1')
title = ET.SubElement(book, 'title')
title.text = 'Learning XML'
# Writing XML
tree = ET.ElementTree(library)
tree.write('output.xml', encoding='utf-8', xml_declaration=True)
lxml (Third-party Library)
from lxml import etree, objectify
# Parsing with lxml
parser = etree.XMLParser(remove_blank_text=True)
tree = etree.parse('library.xml', parser)
root = tree.getroot()
# XPath support
books = root.xpath('//book[@id="1"]')
titles = root.xpath('//book/title/text()')
expensive_books = root.xpath('//book[price > 30]')
# Object-oriented approach with objectify
obj_root = objectify.parse('library.xml').getroot()
for book in obj_root.book:
print(f"Title: {book.title}")
print(f"Author: {book.author}")
print(f"Price: ${book.price}")
# Schema validation
schema_doc = etree.parse('schema.xsd')
schema = etree.XMLSchema(schema_doc)
if schema.validate(tree):
print("Document is valid")
else:
print("Validation errors:", schema.error_log)
BeautifulSoup (HTML/XML Parser)
from bs4 import BeautifulSoup
# Parse XML with BeautifulSoup
with open('library.xml', 'r') as file:
content = file.read()
soup = BeautifulSoup(content, 'xml')
# Find elements
books = soup.find_all('book')
for book in books:
book_id = book.get('id')
title = book.find('title').string
author = book.find('author').string
price = book.find('price').string
print(f"Book {book_id}: {title} by {author} - ${price}")
# CSS selectors
titles = soup.select('book > title')
expensive_books = soup.find_all('book', lambda tag:
float(tag.find('price').string) > 30 if tag.find('price') else False)
C# XML APIs
System.Xml (Built-in)
using System.Xml;
// XmlDocument (DOM-like)
XmlDocument doc = new XmlDocument();
doc.Load("library.xml");
XmlNodeList books = doc.SelectNodes("//book");
foreach (XmlNode book in books)
{
string id = book.Attributes["id"].Value;
string title = book.SelectSingleNode("title").InnerText;
string author = book.SelectSingleNode("author").InnerText;
string price = book.SelectSingleNode("price").InnerText;
Console.WriteLine($"Book {id}: {title} by {author} - ${price}");
}
// XmlReader (Forward-only)
using (XmlReader reader = XmlReader.Create("library.xml"))
{
while (reader.Read())
{
if (reader.NodeType == XmlNodeType.Element && reader.Name == "book")
{
string id = reader.GetAttribute("id");
// Process book element
}
}
}
// XPath
XmlNode titleNode = doc.SelectSingleNode("//book[@id='1']/title");
XmlNodeList expensiveBooks = doc.SelectNodes("//book[price > 30]");
LINQ to XML
using System.Xml.Linq;
// Load and parse XML
XDocument doc = XDocument.Load("library.xml");
// LINQ queries
var books = from book in doc.Descendants("book")
select new
{
Id = book.Attribute("id").Value,
Title = book.Element("title").Value,
Author = book.Element("author").Value,
Price = decimal.Parse(book.Element("price").Value)
};
foreach (var book in books)
{
Console.WriteLine($"Book {book.Id}: {book.Title} by {book.Author} - ${book.Price}");
}
// Filtering with LINQ
var expensiveBooks = from book in doc.Descendants("book")
where decimal.Parse(book.Element("price").Value) > 30
select book;
// Creating XML
XDocument newDoc = new XDocument(
new XElement("library",
new XElement("book", new XAttribute("id", "1"),
new XElement("title", "Learning XML"),
new XElement("author", "Jane Doe"),
new XElement("price", "29.99")
)
)
);
newDoc.Save("output.xml");
XML Serialization
using System.Xml.Serialization;
[XmlRoot("library")]
public class Library
{
[XmlElement("book")]
public List<Book> Books { get; set; }
}
public class Book
{
[XmlAttribute("id")]
public string Id { get; set; }
[XmlElement("title")]
public string Title { get; set; }
[XmlElement("author")]
public string Author { get; set; }
[XmlElement("price")]
public decimal Price { get; set; }
}
// Serialization
XmlSerializer serializer = new XmlSerializer(typeof(Library));
using (FileStream fs = new FileStream("output.xml", FileMode.Create))
{
serializer.Serialize(fs, library);
}
// Deserialization
using (FileStream fs = new FileStream("input.xml", FileMode.Open))
{
Library library = (Library)serializer.Deserialize(fs);
}
JavaScript XML APIs
Browser DOM API
// Parse XML string
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');
// Check for parsing errors
const parserError = xmlDoc.getElementsByTagName('parsererror');
if (parserError.length > 0) {
console.error('XML parsing error');
return;
}
// Navigate DOM
const books = xmlDoc.getElementsByTagName('book');
for (let i = 0; i < books.length; i++) {
const book = books[i];
const id = book.getAttribute('id');
const title = book.getElementsByTagName('title')[0].textContent;
const author = book.getElementsByTagName('author')[0].textContent;
const price = book.getElementsByTagName('price')[0].textContent;
console.log(`Book ${id}: ${title} by ${author} - $${price}`);
}
// XPath in browsers
const xpath = "//book[@id='1']/title/text()";
const result = xmlDoc.evaluate(xpath, xmlDoc, null, XPathResult.STRING_TYPE, null);
const title = result.stringValue;
Node.js XML Libraries
// Using xml2js
const xml2js = require('xml2js');
const fs = require('fs');
// Parse XML to JavaScript object
const parser = new xml2js.Parser();
fs.readFile('library.xml', (err, data) => {
parser.parseString(data, (err, result) => {
const books = result.library.book;
books.forEach(book => {
console.log(`Book ${book.$.id}: ${book.title[0]} by ${book.author[0]} - $${book.price[0]}`);
});
});
});
// Convert JavaScript object to XML
const builder = new xml2js.Builder();
const obj = {
library: {
book: [
{
$: { id: '1' },
title: ['Learning XML'],
author: ['Jane Doe'],
price: ['29.99']
}
]
}
};
const xml = builder.buildObject(obj);
// Using fast-xml-parser
const { XMLParser, XMLBuilder } = require('fast-xml-parser');
const parser = new XMLParser({
ignoreAttributes: false,
attributeNamePrefix: '@_'
});
const jsonObj = parser.parse(xmlData);
console.log(jsonObj.library.book);
// Building XML
const builder = new XMLBuilder({
ignoreAttributes: false,
attributeNamePrefix: '@_'
});
const xmlContent = builder.build(jsonObj);
PHP XML APIs
SimpleXML
// Load and parse XML
$xml = simplexml_load_file('library.xml');
// Access elements
foreach ($xml->book as $book) {
$id = (string)$book['id'];
$title = (string)$book->title;
$author = (string)$book->author;
$price = (string)$book->price;
echo "Book $id: $title by $author - $$price\n";
}
// XPath
$expensive_books = $xml->xpath('//book[price > 30]');
foreach ($expensive_books as $book) {
echo "Expensive book: " . $book->title . "\n";
}
// Creating XML
$xml = new SimpleXMLElement('<library></library>');
$book = $xml->addChild('book');
$book->addAttribute('id', '1');
$book->addChild('title', 'Learning XML');
$book->addChild('author', 'Jane Doe');
$book->addChild('price', '29.99');
echo $xml->asXML();
DOMDocument
// Load XML
$dom = new DOMDocument();
$dom->load('library.xml');
// XPath
$xpath = new DOMXPath($dom);
$books = $xpath->query('//book');
foreach ($books as $book) {
$id = $book->getAttribute('id');
$title = $xpath->query('title', $book)->item(0)->nodeValue;
$author = $xpath->query('author', $book)->item(0)->nodeValue;
$price = $xpath->query('price', $book)->item(0)->nodeValue;
echo "Book $id: $title by $author - $$price\n";
}
// Creating elements
$newBook = $dom->createElement('book');
$newBook->setAttribute('id', '3');
$titleElement = $dom->createElement('title', 'Advanced XML');
$newBook->appendChild($titleElement);
$root = $dom->documentElement;
$root->appendChild($newBook);
echo $dom->saveXML();
Go XML APIs
encoding/xml (Built-in)
package main
import (
"encoding/xml"
"fmt"
"io/ioutil"
)
type Library struct {
XMLName xml.Name `xml:"library"`
Books []Book `xml:"book"`
}
type Book struct {
ID string `xml:"id,attr"`
Title string `xml:"title"`
Author string `xml:"author"`
Price string `xml:"price"`
}
func main() {
// Read XML file
data, err := ioutil.ReadFile("library.xml")
if err != nil {
panic(err)
}
// Unmarshal XML
var library Library
err = xml.Unmarshal(data, &library)
if err != nil {
panic(err)
}
// Process data
for _, book := range library.Books {
fmt.Printf("Book %s: %s by %s - $%s\n",
book.ID, book.Title, book.Author, book.Price)
}
// Marshal to XML
output, err := xml.MarshalIndent(library, "", " ")
if err != nil {
panic(err)
}
fmt.Println(string(output))
}
Rust XML APIs
serde-xml-rs
use serde::{Deserialize, Serialize};
use std::fs;
#[derive(Debug, Deserialize, Serialize)]
struct Library {
#[serde(rename = "book")]
books: Vec<Book>,
}
#[derive(Debug, Deserialize, Serialize)]
struct Book {
#[serde(rename = "@id")]
id: String,
title: String,
author: String,
price: String,
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
// Read XML file
let xml_content = fs::read_to_string("library.xml")?;
// Deserialize
let library: Library = serde_xml_rs::from_str(&xml_content)?;
// Process data
for book in &library.books {
println!("Book {}: {} by {} - ${}",
book.id, book.title, book.author, book.price);
}
// Serialize
let xml_output = serde_xml_rs::to_string(&library)?;
println!("{}", xml_output);
Ok(())
}
Performance Comparison
Benchmarking Different APIs
public class XMLPerformanceBenchmark {
public void benchmarkParsers(String filePath, int iterations) {
// DOM parsing
long domTime = benchmarkDOM(filePath, iterations);
// SAX parsing
long saxTime = benchmarkSAX(filePath, iterations);
// StAX parsing
long staxTime = benchmarkStAX(filePath, iterations);
System.out.println("Performance Results:");
System.out.printf("DOM: %d ms%n", domTime);
System.out.printf("SAX: %d ms%n", saxTime);
System.out.printf("StAX: %d ms%n", staxTime);
}
private long benchmarkDOM(String filePath, int iterations) {
long start = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(filePath);
// Process document
} catch (Exception e) {
e.printStackTrace();
}
}
return System.currentTimeMillis() - start;
}
// Similar benchmark methods for SAX and StAX...
}
API Selection Guidelines
Choosing the Right API
Use Case | Recommended API | Reason |
---|---|---|
Small XML files (< 10MB) | DOM-based APIs | Easy to use, full document access |
Large XML files (> 100MB) | SAX/StAX APIs | Memory efficient |
Data binding | JAXB, XML Serialization | Type safety, easy mapping |
Simple parsing | ElementTree, SimpleXML | Lightweight, easy syntax |
Complex queries | XPath-enabled APIs | Powerful querying |
High performance | Native streaming APIs | Minimal overhead |
Web applications | Browser DOM API | Built-in browser support |
Best Practices by Language
Java:
- Use JAXP for standard processing
- Consider JAXB for object mapping
- Use StAX for large file processing
Python:
- ElementTree for simple tasks
- lxml for complex operations and validation
- BeautifulSoup for forgiving parsing
C#:
- LINQ to XML for modern applications
- XmlDocument for legacy compatibility
- XmlReader for memory-constrained scenarios
JavaScript:
- Native DOM API in browsers
- xml2js or fast-xml-parser in Node.js