StAX Parsing
StAX (Streaming API for XML) is a pull-based XML processing API that combines the memory efficiency of SAX with greater control over the parsing process. Unlike SAX's push model where events are fired to your application, StAX uses a pull model where your application controls when to advance to the next parsing event.
StAX provides two main APIs: the Cursor API for low-level control and the Iterator API for object-oriented event handling. This approach offers better performance than DOM for large documents while providing more flexibility than SAX.
StAX Processing Model
StAX parsing follows a pull-based approach:
- Application Control: Your code decides when to advance to the next event
- Streaming Processing: Documents are processed incrementally
- Bidirectional: Supports both reading and writing XML
- Memory Efficient: Only current parsing context is kept in memory
- Event-Based: Similar events to SAX but pulled rather than pushed
StAX Events
Common StAX events include:
- START_DOCUMENT: Beginning of document
- START_ELEMENT: Opening XML tag
- CHARACTERS: Text content
- END_ELEMENT: Closing XML tag
- END_DOCUMENT: End of document
- ATTRIBUTE: Element attributes
- NAMESPACE: Namespace declarations
Java StAX Implementation
Cursor API Example
import javax.xml.stream.*;
import java.io.FileInputStream;
public class StAXCursorExample {
public void parseXMLWithCursor(String filePath) {
XMLInputFactory factory = XMLInputFactory.newInstance();
try (FileInputStream fis = new FileInputStream(filePath)) {
XMLStreamReader reader = factory.createXMLStreamReader(fis);
String currentBookId = null;
StringBuilder content = new StringBuilder();
while (reader.hasNext()) {
int event = reader.next();
switch (event) {
case XMLStreamConstants.START_DOCUMENT:
System.out.println("Starting document parsing");
break;
case XMLStreamConstants.START_ELEMENT:
handleStartElement(reader, currentBookId, content);
if ("book".equals(reader.getLocalName())) {
currentBookId = reader.getAttributeValue(null, "id");
System.out.println("\n--- Book ID: " + currentBookId + " ---");
}
break;
case XMLStreamConstants.CHARACTERS:
if (!reader.isWhiteSpace()) {
content.append(reader.getText());
}
break;
case XMLStreamConstants.END_ELEMENT:
handleEndElement(reader, content);
break;
case XMLStreamConstants.END_DOCUMENT:
System.out.println("\nDocument parsing completed");
break;
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void handleStartElement(XMLStreamReader reader, String currentBookId, StringBuilder content) {
content.setLength(0); // Clear content buffer
// Process attributes if needed
int attributeCount = reader.getAttributeCount();
for (int i = 0; i < attributeCount; i++) {
String attrName = reader.getAttributeLocalName(i);
String attrValue = reader.getAttributeValue(i);
// Process attribute as needed
}
}
private void handleEndElement(XMLStreamReader reader, StringBuilder content) {
String elementName = reader.getLocalName();
String text = content.toString().trim();
switch (elementName) {
case "title":
System.out.println("Title: " + text);
break;
case "author":
System.out.println("Author: " + text);
break;
case "price":
System.out.println("Price: $" + text);
break;
}
}
}
Iterator API Example
import javax.xml.stream.events.*;
import javax.xml.stream.*;
import java.io.FileInputStream;
import java.util.Iterator;
public class StAXIteratorExample {
public void parseXMLWithIterator(String filePath) {
XMLInputFactory factory = XMLInputFactory.newInstance();
try (FileInputStream fis = new FileInputStream(filePath)) {
XMLEventReader eventReader = factory.createXMLEventReader(fis);
String currentBookId = null;
StringBuilder content = new StringBuilder();
while (eventReader.hasNext()) {
XMLEvent event = eventReader.nextEvent();
if (event.isStartDocument()) {
System.out.println("Starting document parsing");
} else if (event.isStartElement()) {
StartElement startElement = event.asStartElement();
handleStartElement(startElement, currentBookId, content);
if ("book".equals(startElement.getName().getLocalPart())) {
Attribute idAttr = startElement.getAttributeByName(new QName("id"));
if (idAttr != null) {
currentBookId = idAttr.getValue();
System.out.println("\n--- Book ID: " + currentBookId + " ---");
}
}
} else if (event.isCharacters()) {
Characters characters = event.asCharacters();
if (!characters.isWhiteSpace()) {
content.append(characters.getData());
}
} else if (event.isEndElement()) {
EndElement endElement = event.asEndElement();
handleEndElement(endElement, content);
} else if (event.isEndDocument()) {
System.out.println("\nDocument parsing completed");
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
private void handleStartElement(StartElement startElement, String currentBookId, StringBuilder content) {
content.setLength(0);
// Process attributes
Iterator<Attribute> attributes = startElement.getAttributes();
while (attributes.hasNext()) {
Attribute attr = attributes.next();
String attrName = attr.getName().getLocalPart();
String attrValue = attr.getValue();
// Process attribute as needed
}
}
private void handleEndElement(EndElement endElement, StringBuilder content) {
String elementName = endElement.getName().getLocalPart();
String text = content.toString().trim();
switch (elementName) {
case "title":
System.out.println("Title: " + text);
break;
case "author":
System.out.println("Author: " + text);
break;
case "price":
System.out.println("Price: $" + text);
break;
}
}
}
Advanced StAX Features
Namespace Handling
public void handleNamespaces(XMLStreamReader reader) {
if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) {
// Get element namespace
String namespaceURI = reader.getNamespaceURI();
String localName = reader.getLocalName();
String prefix = reader.getPrefix();
System.out.println("Element: " + localName);
System.out.println("Namespace: " + namespaceURI);
System.out.println("Prefix: " + prefix);
// Get namespace declarations
int namespaceCount = reader.getNamespaceCount();
for (int i = 0; i < namespaceCount; i++) {
String nsPrefix = reader.getNamespacePrefix(i);
String nsURI = reader.getNamespaceURI(i);
System.out.println("Namespace declaration: " + nsPrefix + " -> " + nsURI);
}
}
}
Filtering Events
import javax.xml.stream.StreamFilter;
public class BookElementFilter implements StreamFilter {
@Override
public boolean accept(XMLStreamReader reader) {
// Only accept book-related elements
if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) {
String localName = reader.getLocalName();
return "book".equals(localName) ||
"title".equals(localName) ||
"author".equals(localName) ||
"price".equals(localName);
}
return true;
}
}
// Usage
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(inputStream);
XMLStreamReader filteredReader = factory.createFilteredReader(reader, new BookElementFilter());
StAX Writing (XML Generation)
Writing XML with Cursor API
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import java.io.FileOutputStream;
public class StAXWriterExample {
public void writeXML(String filePath) {
XMLOutputFactory factory = XMLOutputFactory.newInstance();
try (FileOutputStream fos = new FileOutputStream(filePath)) {
XMLStreamWriter writer = factory.createXMLStreamWriter(fos, "UTF-8");
// Write XML declaration
writer.writeStartDocument("UTF-8", "1.0");
writer.writeCharacters("\n");
// Write root element
writer.writeStartElement("library");
writer.writeCharacters("\n ");
// Write first book
writeBook(writer, "1", "Learning XML", "Jane Doe", "29.99");
writer.writeCharacters("\n ");
// Write second book
writeBook(writer, "2", "Advanced XML", "John Smith", "39.99");
writer.writeCharacters("\n");
// Close root element
writer.writeEndElement(); // library
writer.writeEndDocument();
writer.flush();
writer.close();
System.out.println("XML file written successfully");
} catch (Exception e) {
e.printStackTrace();
}
}
private void writeBook(XMLStreamWriter writer, String id, String title, String author, String price)
throws XMLStreamException {
writer.writeStartElement("book");
writer.writeAttribute("id", id);
writer.writeCharacters("\n ");
writer.writeStartElement("title");
writer.writeCharacters(title);
writer.writeEndElement();
writer.writeCharacters("\n ");
writer.writeStartElement("author");
writer.writeCharacters(author);
writer.writeEndElement();
writer.writeCharacters("\n ");
writer.writeStartElement("price");
writer.writeCharacters(price);
writer.writeEndElement();
writer.writeCharacters("\n ");
writer.writeEndElement(); // book
}
}
Pretty Printing XML
public class PrettyPrintStAXWriter {
private XMLStreamWriter writer;
private int indentLevel = 0;
private final String INDENT = " ";
public PrettyPrintStAXWriter(XMLStreamWriter writer) {
this.writer = writer;
}
public void writeStartElement(String localName) throws XMLStreamException {
writeIndent();
writer.writeStartElement(localName);
indentLevel++;
}
public void writeEndElement() throws XMLStreamException {
indentLevel--;
writeIndent();
writer.writeEndElement();
}
public void writeCharacters(String text) throws XMLStreamException {
writer.writeCharacters(text);
}
private void writeIndent() throws XMLStreamException {
writer.writeCharacters("\n");
for (int i = 0; i < indentLevel; i++) {
writer.writeCharacters(INDENT);
}
}
}
Performance Optimization
Buffered Reading
public class OptimizedStAXReader {
private static final int BUFFER_SIZE = 8192;
public void parseOptimized(String filePath) {
XMLInputFactory factory = XMLInputFactory.newInstance();
// Configure factory for performance
factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.FALSE);
factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.TRUE);
factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
factory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
try (BufferedInputStream bis = new BufferedInputStream(
new FileInputStream(filePath), BUFFER_SIZE)) {
XMLStreamReader reader = factory.createXMLStreamReader(bis);
// Process with optimized settings
processEvents(reader);
} catch (Exception e) {
e.printStackTrace();
}
}
private void processEvents(XMLStreamReader reader) throws XMLStreamException {
while (reader.hasNext()) {
int event = reader.next();
// Optimized event processing
switch (event) {
case XMLStreamConstants.START_ELEMENT:
// Process start element efficiently
break;
case XMLStreamConstants.CHARACTERS:
// Handle characters without creating unnecessary strings
if (!reader.isWhiteSpace()) {
// Process text content
}
break;
case XMLStreamConstants.END_ELEMENT:
// Process end element
break;
}
}
}
}
Memory Management
public class MemoryEfficientStAXProcessor {
private final int MAX_STRING_SIZE = 1024 * 1024; // 1MB limit
public void processLargeDocument(InputStream inputStream) {
XMLInputFactory factory = XMLInputFactory.newInstance();
try {
XMLStreamReader reader = factory.createXMLStreamReader(inputStream);
while (reader.hasNext()) {
int event = reader.next();
if (event == XMLStreamConstants.CHARACTERS) {
String text = reader.getText();
// Prevent memory issues with very large text content
if (text.length() > MAX_STRING_SIZE) {
System.err.println("Warning: Large text content detected, processing in chunks");
processLargeTextContent(reader);
} else {
processNormalTextContent(text);
}
}
}
} catch (XMLStreamException e) {
System.err.println("Error processing XML: " + e.getMessage());
}
}
private void processLargeTextContent(XMLStreamReader reader) {
// Handle large text content in chunks to avoid memory issues
System.out.println("Processing large text content in chunks...");
}
private void processNormalTextContent(String text) {
// Regular text content processing
if (!text.trim().isEmpty()) {
System.out.println("Text: " + text.trim());
}
}
}
Error Handling and Validation
Robust Error Handling
public class RobustStAXParser {
public void parseWithErrorHandling(String filePath) {
XMLInputFactory factory = XMLInputFactory.newInstance();
try (FileInputStream fis = new FileInputStream(filePath)) {
XMLStreamReader reader = factory.createXMLStreamReader(fis);
while (reader.hasNext()) {
try {
int event = reader.next();
processEvent(reader, event);
} catch (XMLStreamException e) {
handleParsingError(e, reader);
// Attempt to continue parsing
if (reader.hasNext()) {
continue;
} else {
break;
}
}
}
} catch (Exception e) {
System.err.println("Fatal error: " + e.getMessage());
}
}
private void processEvent(XMLStreamReader reader, int event) throws XMLStreamException {
switch (event) {
case XMLStreamConstants.START_ELEMENT:
validateElement(reader);
break;
case XMLStreamConstants.CHARACTERS:
validateCharacters(reader);
break;
// Other event processing
}
}
private void validateElement(XMLStreamReader reader) throws XMLStreamException {
String localName = reader.getLocalName();
// Custom validation logic
if (localName == null || localName.trim().isEmpty()) {
throw new XMLStreamException("Invalid element name at line " + reader.getLocation().getLineNumber());
}
}
private void validateCharacters(XMLStreamReader reader) throws XMLStreamException {
String text = reader.getText();
// Custom character validation
if (text != null && text.contains("\0")) {
throw new XMLStreamException("Invalid character content at line " + reader.getLocation().getLineNumber());
}
}
private void handleParsingError(XMLStreamException e, XMLStreamReader reader) {
Location location = reader.getLocation();
System.err.printf("Parsing error at line %d, column %d: %s%n",
location.getLineNumber(),
location.getColumnNumber(),
e.getMessage());
}
}
StAX Best Practices
Configuration for Performance
public class StAXConfiguration {
public static XMLInputFactory createOptimizedInputFactory() {
XMLInputFactory factory = XMLInputFactory.newInstance();
// Performance optimizations
factory.setProperty(XMLInputFactory.IS_COALESCING, false);
factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true);
factory.setProperty(XMLInputFactory.IS_VALIDATING, false);
factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
return factory;
}
public static XMLOutputFactory createOptimizedOutputFactory() {
XMLOutputFactory factory = XMLOutputFactory.newInstance();
// Performance settings
factory.setProperty(XMLOutputFactory.IS_REPAIRING_NAMESPACES, true);
return factory;
}
}
Thread Safety
public class ThreadSafeStAXProcessor {
private final XMLInputFactory inputFactory;
private final XMLOutputFactory outputFactory;
public ThreadSafeStAXProcessor() {
// Create thread-safe factories
this.inputFactory = XMLInputFactory.newInstance();
this.outputFactory = XMLOutputFactory.newInstance();
}
public void processInParallel(List<String> filePaths) {
filePaths.parallelStream().forEach(this::processFile);
}
private void processFile(String filePath) {
// Each thread gets its own reader instance
try (FileInputStream fis = new FileInputStream(filePath)) {
XMLStreamReader reader = inputFactory.createXMLStreamReader(fis);
// Process the file
while (reader.hasNext()) {
int event = reader.next();
// Process events
}
} catch (Exception e) {
System.err.println("Error processing " + filePath + ": " + e.getMessage());
}
}
}
Advantages of StAX
Benefits
- Pull-based Control: Application controls parsing flow
- Memory Efficient: Minimal memory usage like SAX
- Bidirectional: Supports both reading and writing
- Easy to Use: Simpler than SAX for complex parsing logic
- Good Performance: Efficient for large documents
- Flexible: Can pause and resume parsing
Use Cases
- Large XML document processing
- XML transformation and filtering
- Streaming XML data processing
- XML generation and serialization
- Parse-time validation and processing
Comparison with Other Parsing Methods
Feature | StAX | DOM | SAX |
---|---|---|---|
Memory Usage | Low | High | Very Low |
Processing Control | Full | Full | Limited |
Document Size | Any | Limited | Any |
Access Pattern | Pull-based | Random | Push-based |
Modification | Read/Write | Read/Write | Read-only |
Complexity | Moderate | Low | Moderate |
Performance | Very Good | Moderate | Excellent |
When to Use StAX
- Processing large XML files with selective data extraction
- Need for both reading and writing capabilities
- Require control over parsing flow
- Complex parsing logic that would be difficult with SAX
- Memory-constrained environments where DOM is too heavy