1. xml
  2. /processing
  3. /stax

StAX Parsing

StAX (Streaming API for XML) is a pull-based XML processing API that combines the memory efficiency of SAX with greater control over the parsing process. Unlike SAX's push model where events are fired to your application, StAX uses a pull model where your application controls when to advance to the next parsing event.

StAX provides two main APIs: the Cursor API for low-level control and the Iterator API for object-oriented event handling. This approach offers better performance than DOM for large documents while providing more flexibility than SAX.

StAX Processing Model

StAX parsing follows a pull-based approach:

  1. Application Control: Your code decides when to advance to the next event
  2. Streaming Processing: Documents are processed incrementally
  3. Bidirectional: Supports both reading and writing XML
  4. Memory Efficient: Only current parsing context is kept in memory
  5. Event-Based: Similar events to SAX but pulled rather than pushed

StAX Events

Common StAX events include:

  • START_DOCUMENT: Beginning of document
  • START_ELEMENT: Opening XML tag
  • CHARACTERS: Text content
  • END_ELEMENT: Closing XML tag
  • END_DOCUMENT: End of document
  • ATTRIBUTE: Element attributes
  • NAMESPACE: Namespace declarations

Java StAX Implementation

Cursor API Example

import javax.xml.stream.*;
import java.io.FileInputStream;

public class StAXCursorExample {
    public void parseXMLWithCursor(String filePath) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        try (FileInputStream fis = new FileInputStream(filePath)) {
            XMLStreamReader reader = factory.createXMLStreamReader(fis);
            
            String currentBookId = null;
            StringBuilder content = new StringBuilder();
            
            while (reader.hasNext()) {
                int event = reader.next();
                
                switch (event) {
                    case XMLStreamConstants.START_DOCUMENT:
                        System.out.println("Starting document parsing");
                        break;
                        
                    case XMLStreamConstants.START_ELEMENT:
                        handleStartElement(reader, currentBookId, content);
                        if ("book".equals(reader.getLocalName())) {
                            currentBookId = reader.getAttributeValue(null, "id");
                            System.out.println("\n--- Book ID: " + currentBookId + " ---");
                        }
                        break;
                        
                    case XMLStreamConstants.CHARACTERS:
                        if (!reader.isWhiteSpace()) {
                            content.append(reader.getText());
                        }
                        break;
                        
                    case XMLStreamConstants.END_ELEMENT:
                        handleEndElement(reader, content);
                        break;
                        
                    case XMLStreamConstants.END_DOCUMENT:
                        System.out.println("\nDocument parsing completed");
                        break;
                }
            }
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private void handleStartElement(XMLStreamReader reader, String currentBookId, StringBuilder content) {
        content.setLength(0); // Clear content buffer
        
        // Process attributes if needed
        int attributeCount = reader.getAttributeCount();
        for (int i = 0; i < attributeCount; i++) {
            String attrName = reader.getAttributeLocalName(i);
            String attrValue = reader.getAttributeValue(i);
            // Process attribute as needed
        }
    }
    
    private void handleEndElement(XMLStreamReader reader, StringBuilder content) {
        String elementName = reader.getLocalName();
        String text = content.toString().trim();
        
        switch (elementName) {
            case "title":
                System.out.println("Title: " + text);
                break;
            case "author":
                System.out.println("Author: " + text);
                break;
            case "price":
                System.out.println("Price: $" + text);
                break;
        }
    }
}

Iterator API Example

import javax.xml.stream.events.*;
import javax.xml.stream.*;
import java.io.FileInputStream;
import java.util.Iterator;

public class StAXIteratorExample {
    public void parseXMLWithIterator(String filePath) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        try (FileInputStream fis = new FileInputStream(filePath)) {
            XMLEventReader eventReader = factory.createXMLEventReader(fis);
            
            String currentBookId = null;
            StringBuilder content = new StringBuilder();
            
            while (eventReader.hasNext()) {
                XMLEvent event = eventReader.nextEvent();
                
                if (event.isStartDocument()) {
                    System.out.println("Starting document parsing");
                    
                } else if (event.isStartElement()) {
                    StartElement startElement = event.asStartElement();
                    handleStartElement(startElement, currentBookId, content);
                    
                    if ("book".equals(startElement.getName().getLocalPart())) {
                        Attribute idAttr = startElement.getAttributeByName(new QName("id"));
                        if (idAttr != null) {
                            currentBookId = idAttr.getValue();
                            System.out.println("\n--- Book ID: " + currentBookId + " ---");
                        }
                    }
                    
                } else if (event.isCharacters()) {
                    Characters characters = event.asCharacters();
                    if (!characters.isWhiteSpace()) {
                        content.append(characters.getData());
                    }
                    
                } else if (event.isEndElement()) {
                    EndElement endElement = event.asEndElement();
                    handleEndElement(endElement, content);
                    
                } else if (event.isEndDocument()) {
                    System.out.println("\nDocument parsing completed");
                }
            }
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private void handleStartElement(StartElement startElement, String currentBookId, StringBuilder content) {
        content.setLength(0);
        
        // Process attributes
        Iterator<Attribute> attributes = startElement.getAttributes();
        while (attributes.hasNext()) {
            Attribute attr = attributes.next();
            String attrName = attr.getName().getLocalPart();
            String attrValue = attr.getValue();
            // Process attribute as needed
        }
    }
    
    private void handleEndElement(EndElement endElement, StringBuilder content) {
        String elementName = endElement.getName().getLocalPart();
        String text = content.toString().trim();
        
        switch (elementName) {
            case "title":
                System.out.println("Title: " + text);
                break;
            case "author":
                System.out.println("Author: " + text);
                break;
            case "price":
                System.out.println("Price: $" + text);
                break;
        }
    }
}

Advanced StAX Features

Namespace Handling

public void handleNamespaces(XMLStreamReader reader) {
    if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) {
        // Get element namespace
        String namespaceURI = reader.getNamespaceURI();
        String localName = reader.getLocalName();
        String prefix = reader.getPrefix();
        
        System.out.println("Element: " + localName);
        System.out.println("Namespace: " + namespaceURI);
        System.out.println("Prefix: " + prefix);
        
        // Get namespace declarations
        int namespaceCount = reader.getNamespaceCount();
        for (int i = 0; i < namespaceCount; i++) {
            String nsPrefix = reader.getNamespacePrefix(i);
            String nsURI = reader.getNamespaceURI(i);
            System.out.println("Namespace declaration: " + nsPrefix + " -> " + nsURI);
        }
    }
}

Filtering Events

import javax.xml.stream.StreamFilter;

public class BookElementFilter implements StreamFilter {
    @Override
    public boolean accept(XMLStreamReader reader) {
        // Only accept book-related elements
        if (reader.getEventType() == XMLStreamConstants.START_ELEMENT) {
            String localName = reader.getLocalName();
            return "book".equals(localName) || 
                   "title".equals(localName) || 
                   "author".equals(localName) || 
                   "price".equals(localName);
        }
        return true;
    }
}

// Usage
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader = factory.createXMLStreamReader(inputStream);
XMLStreamReader filteredReader = factory.createFilteredReader(reader, new BookElementFilter());

StAX Writing (XML Generation)

Writing XML with Cursor API

import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamWriter;
import java.io.FileOutputStream;

public class StAXWriterExample {
    public void writeXML(String filePath) {
        XMLOutputFactory factory = XMLOutputFactory.newInstance();
        
        try (FileOutputStream fos = new FileOutputStream(filePath)) {
            XMLStreamWriter writer = factory.createXMLStreamWriter(fos, "UTF-8");
            
            // Write XML declaration
            writer.writeStartDocument("UTF-8", "1.0");
            writer.writeCharacters("\n");
            
            // Write root element
            writer.writeStartElement("library");
            writer.writeCharacters("\n  ");
            
            // Write first book
            writeBook(writer, "1", "Learning XML", "Jane Doe", "29.99");
            writer.writeCharacters("\n  ");
            
            // Write second book
            writeBook(writer, "2", "Advanced XML", "John Smith", "39.99");
            writer.writeCharacters("\n");
            
            // Close root element
            writer.writeEndElement(); // library
            
            writer.writeEndDocument();
            writer.flush();
            writer.close();
            
            System.out.println("XML file written successfully");
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private void writeBook(XMLStreamWriter writer, String id, String title, String author, String price) 
            throws XMLStreamException {
        
        writer.writeStartElement("book");
        writer.writeAttribute("id", id);
        writer.writeCharacters("\n    ");
        
        writer.writeStartElement("title");
        writer.writeCharacters(title);
        writer.writeEndElement();
        writer.writeCharacters("\n    ");
        
        writer.writeStartElement("author");
        writer.writeCharacters(author);
        writer.writeEndElement();
        writer.writeCharacters("\n    ");
        
        writer.writeStartElement("price");
        writer.writeCharacters(price);
        writer.writeEndElement();
        writer.writeCharacters("\n  ");
        
        writer.writeEndElement(); // book
    }
}

Pretty Printing XML

public class PrettyPrintStAXWriter {
    private XMLStreamWriter writer;
    private int indentLevel = 0;
    private final String INDENT = "  ";
    
    public PrettyPrintStAXWriter(XMLStreamWriter writer) {
        this.writer = writer;
    }
    
    public void writeStartElement(String localName) throws XMLStreamException {
        writeIndent();
        writer.writeStartElement(localName);
        indentLevel++;
    }
    
    public void writeEndElement() throws XMLStreamException {
        indentLevel--;
        writeIndent();
        writer.writeEndElement();
    }
    
    public void writeCharacters(String text) throws XMLStreamException {
        writer.writeCharacters(text);
    }
    
    private void writeIndent() throws XMLStreamException {
        writer.writeCharacters("\n");
        for (int i = 0; i < indentLevel; i++) {
            writer.writeCharacters(INDENT);
        }
    }
}

Performance Optimization

Buffered Reading

public class OptimizedStAXReader {
    private static final int BUFFER_SIZE = 8192;
    
    public void parseOptimized(String filePath) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        // Configure factory for performance
        factory.setProperty(XMLInputFactory.IS_COALESCING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, Boolean.TRUE);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, Boolean.FALSE);
        factory.setProperty(XMLInputFactory.SUPPORT_DTD, Boolean.FALSE);
        
        try (BufferedInputStream bis = new BufferedInputStream(
                new FileInputStream(filePath), BUFFER_SIZE)) {
            
            XMLStreamReader reader = factory.createXMLStreamReader(bis);
            
            // Process with optimized settings
            processEvents(reader);
            
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private void processEvents(XMLStreamReader reader) throws XMLStreamException {
        while (reader.hasNext()) {
            int event = reader.next();
            
            // Optimized event processing
            switch (event) {
                case XMLStreamConstants.START_ELEMENT:
                    // Process start element efficiently
                    break;
                case XMLStreamConstants.CHARACTERS:
                    // Handle characters without creating unnecessary strings
                    if (!reader.isWhiteSpace()) {
                        // Process text content
                    }
                    break;
                case XMLStreamConstants.END_ELEMENT:
                    // Process end element
                    break;
            }
        }
    }
}

Memory Management

public class MemoryEfficientStAXProcessor {
    private final int MAX_STRING_SIZE = 1024 * 1024; // 1MB limit
    
    public void processLargeDocument(InputStream inputStream) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        try {
            XMLStreamReader reader = factory.createXMLStreamReader(inputStream);
            
            while (reader.hasNext()) {
                int event = reader.next();
                
                if (event == XMLStreamConstants.CHARACTERS) {
                    String text = reader.getText();
                    
                    // Prevent memory issues with very large text content
                    if (text.length() > MAX_STRING_SIZE) {
                        System.err.println("Warning: Large text content detected, processing in chunks");
                        processLargeTextContent(reader);
                    } else {
                        processNormalTextContent(text);
                    }
                }
            }
            
        } catch (XMLStreamException e) {
            System.err.println("Error processing XML: " + e.getMessage());
        }
    }
    
    private void processLargeTextContent(XMLStreamReader reader) {
        // Handle large text content in chunks to avoid memory issues
        System.out.println("Processing large text content in chunks...");
    }
    
    private void processNormalTextContent(String text) {
        // Regular text content processing
        if (!text.trim().isEmpty()) {
            System.out.println("Text: " + text.trim());
        }
    }
}

Error Handling and Validation

Robust Error Handling

public class RobustStAXParser {
    public void parseWithErrorHandling(String filePath) {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        try (FileInputStream fis = new FileInputStream(filePath)) {
            XMLStreamReader reader = factory.createXMLStreamReader(fis);
            
            while (reader.hasNext()) {
                try {
                    int event = reader.next();
                    processEvent(reader, event);
                    
                } catch (XMLStreamException e) {
                    handleParsingError(e, reader);
                    
                    // Attempt to continue parsing
                    if (reader.hasNext()) {
                        continue;
                    } else {
                        break;
                    }
                }
            }
            
        } catch (Exception e) {
            System.err.println("Fatal error: " + e.getMessage());
        }
    }
    
    private void processEvent(XMLStreamReader reader, int event) throws XMLStreamException {
        switch (event) {
            case XMLStreamConstants.START_ELEMENT:
                validateElement(reader);
                break;
            case XMLStreamConstants.CHARACTERS:
                validateCharacters(reader);
                break;
            // Other event processing
        }
    }
    
    private void validateElement(XMLStreamReader reader) throws XMLStreamException {
        String localName = reader.getLocalName();
        
        // Custom validation logic
        if (localName == null || localName.trim().isEmpty()) {
            throw new XMLStreamException("Invalid element name at line " + reader.getLocation().getLineNumber());
        }
    }
    
    private void validateCharacters(XMLStreamReader reader) throws XMLStreamException {
        String text = reader.getText();
        
        // Custom character validation
        if (text != null && text.contains("\0")) {
            throw new XMLStreamException("Invalid character content at line " + reader.getLocation().getLineNumber());
        }
    }
    
    private void handleParsingError(XMLStreamException e, XMLStreamReader reader) {
        Location location = reader.getLocation();
        System.err.printf("Parsing error at line %d, column %d: %s%n",
                         location.getLineNumber(),
                         location.getColumnNumber(),
                         e.getMessage());
    }
}

StAX Best Practices

Configuration for Performance

public class StAXConfiguration {
    public static XMLInputFactory createOptimizedInputFactory() {
        XMLInputFactory factory = XMLInputFactory.newInstance();
        
        // Performance optimizations
        factory.setProperty(XMLInputFactory.IS_COALESCING, false);
        factory.setProperty(XMLInputFactory.IS_NAMESPACE_AWARE, true);
        factory.setProperty(XMLInputFactory.IS_VALIDATING, false);
        factory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
        factory.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
        
        return factory;
    }
    
    public static XMLOutputFactory createOptimizedOutputFactory() {
        XMLOutputFactory factory = XMLOutputFactory.newInstance();
        
        // Performance settings
        factory.setProperty(XMLOutputFactory.IS_REPAIRING_NAMESPACES, true);
        
        return factory;
    }
}

Thread Safety

public class ThreadSafeStAXProcessor {
    private final XMLInputFactory inputFactory;
    private final XMLOutputFactory outputFactory;
    
    public ThreadSafeStAXProcessor() {
        // Create thread-safe factories
        this.inputFactory = XMLInputFactory.newInstance();
        this.outputFactory = XMLOutputFactory.newInstance();
    }
    
    public void processInParallel(List<String> filePaths) {
        filePaths.parallelStream().forEach(this::processFile);
    }
    
    private void processFile(String filePath) {
        // Each thread gets its own reader instance
        try (FileInputStream fis = new FileInputStream(filePath)) {
            XMLStreamReader reader = inputFactory.createXMLStreamReader(fis);
            
            // Process the file
            while (reader.hasNext()) {
                int event = reader.next();
                // Process events
            }
            
        } catch (Exception e) {
            System.err.println("Error processing " + filePath + ": " + e.getMessage());
        }
    }
}

Advantages of StAX

Benefits

  • Pull-based Control: Application controls parsing flow
  • Memory Efficient: Minimal memory usage like SAX
  • Bidirectional: Supports both reading and writing
  • Easy to Use: Simpler than SAX for complex parsing logic
  • Good Performance: Efficient for large documents
  • Flexible: Can pause and resume parsing

Use Cases

  • Large XML document processing
  • XML transformation and filtering
  • Streaming XML data processing
  • XML generation and serialization
  • Parse-time validation and processing

Comparison with Other Parsing Methods

FeatureStAXDOMSAX
Memory UsageLowHighVery Low
Processing ControlFullFullLimited
Document SizeAnyLimitedAny
Access PatternPull-basedRandomPush-based
ModificationRead/WriteRead/WriteRead-only
ComplexityModerateLowModerate
PerformanceVery GoodModerateExcellent

When to Use StAX

  • Processing large XML files with selective data extraction
  • Need for both reading and writing capabilities
  • Require control over parsing flow
  • Complex parsing logic that would be difficult with SAX
  • Memory-constrained environments where DOM is too heavy

Additional Resources

Oracle StAX Tutorial

Java StAX API Documentation

StAX Specification (JSR 173)