XML Transformation Tools and Processors
Choosing the right tools and processors for XML transformation is crucial for performance, compatibility, and development efficiency. This guide covers the major options available across different platforms and use cases.
XSLT Processors
Saxon (Leading XSLT 3.0 Processor)
Saxon is the most advanced XSLT processor, supporting XSLT 3.0, XQuery 3.1, and XPath 3.1.
Command Line Usage
# Basic transformation
java -jar saxon-he-12.3.jar -s:input.xml -xsl:transform.xsl -o:output.xml
# With parameters
java -jar saxon-he-12.3.jar -s:input.xml -xsl:transform.xsl -o:output.xml \
param1=value1 param2=value2
# XSLT 3.0 with streaming
java -jar saxon-ee-12.3.jar -s:large-input.xml -xsl:streaming.xsl -o:output.xml \
-opt:streaming
# XQuery execution
java -jar saxon-he-12.3.jar -q:query.xq -s:input.xml -o:output.xml
Java Integration
public class SaxonTransformer {
public void performXSLTTransformation(String xmlPath, String xslPath,
String outputPath, Map<String, String> parameters) {
try {
// Create Saxon processor
Processor processor = new Processor(false); // false = HE edition
XsltCompiler compiler = processor.newXsltCompiler();
// Compile stylesheet
XsltExecutable executable = compiler.compile(new StreamSource(xslPath));
XsltTransformer transformer = executable.load();
// Set parameters
parameters.forEach((key, value) -> {
transformer.setParameter(new QName(key), new XdmAtomicValue(value));
});
// Set source and destination
transformer.setSource(new StreamSource(xmlPath));
transformer.setDestination(processor.newSerializer(new File(outputPath)));
// Execute transformation
transformer.transform();
} catch (SaxonApiException e) {
throw new RuntimeException("XSLT transformation failed", e);
}
}
public void performXQueryTransformation(String xmlPath, String xqueryPath,
String outputPath) {
try {
Processor processor = new Processor(false);
XQueryCompiler compiler = processor.newXQueryCompiler();
// Compile XQuery
XQueryExecutable executable = compiler.compile(new File(xqueryPath));
XQueryEvaluator evaluator = executable.load();
// Set context document
DocumentBuilder builder = processor.newDocumentBuilder();
XdmNode contextDoc = builder.build(new File(xmlPath));
evaluator.setContextItem(contextDoc);
// Execute and serialize result
Serializer serializer = processor.newSerializer(new File(outputPath));
evaluator.run(serializer);
} catch (SaxonApiException e) {
throw new RuntimeException("XQuery transformation failed", e);
}
}
public void performStreamingTransformation(String xmlPath, String xslPath,
String outputPath) {
try {
// Requires Saxon-EE for streaming
Processor processor = new Processor(true); // true = EE edition
XsltCompiler compiler = processor.newXsltCompiler();
// Enable streaming optimization
compiler.setJustInTimeCompilation(true);
XsltExecutable executable = compiler.compile(new StreamSource(xslPath));
XsltTransformer transformer = executable.load();
// Configure for streaming
transformer.setSource(new StreamSource(xmlPath));
transformer.setDestination(processor.newSerializer(new File(outputPath)));
transformer.transform();
} catch (SaxonApiException e) {
throw new RuntimeException("Streaming transformation failed", e);
}
}
}
Apache Xalan
Traditional Java XSLT processor, part of Apache XML Project.
public class XalanTransformer {
public void transform(String xmlPath, String xslPath, String outputPath)
throws TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
// Set system property to use Xalan explicitly
System.setProperty("javax.xml.transform.TransformerFactory",
"org.apache.xalan.processor.TransformerFactoryImpl");
Transformer transformer = factory.newTransformer(new StreamSource(xslPath));
// Configure Xalan-specific features
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
transformer.transform(
new StreamSource(xmlPath),
new StreamResult(outputPath)
);
}
public void transformWithExtensions(String xmlPath, String xslPath, String outputPath)
throws TransformerException {
TransformerFactory factory = TransformerFactory.newInstance();
// Enable Xalan extensions
try {
factory.setFeature("http://xml.apache.org/xalan/features/optimize", true);
factory.setFeature("http://xml.apache.org/xalan/features/incremental", true);
} catch (TransformerConfigurationException e) {
// Extensions not available, continue without them
}
Transformer transformer = factory.newTransformer(new StreamSource(xslPath));
transformer.transform(new StreamSource(xmlPath), new StreamResult(outputPath));
}
}
libxslt (C/C++)
High-performance C library for XSLT transformations.
#include <libxml/parser.h>
#include <libxslt/xslt.h>
#include <libxslt/transform.h>
#include <libxslt/xsltutils.h>
int performTransformation(const char* xmlPath, const char* xslPath, const char* outputPath) {
xmlDocPtr doc, res;
xsltStylesheetPtr style;
// Initialize libxml2
xmlInitParser();
LIBXML_TEST_VERSION
// Parse the stylesheet
style = xsltParseStylesheetFile((const xmlChar*)xslPath);
if (style == NULL) {
fprintf(stderr, "Failed to parse stylesheet\n");
return -1;
}
// Parse the XML document
doc = xmlParseFile(xmlPath);
if (doc == NULL) {
fprintf(stderr, "Failed to parse XML document\n");
xsltFreeStylesheet(style);
return -1;
}
// Apply the transformation
res = xsltApplyStylesheet(style, doc, NULL);
if (res == NULL) {
fprintf(stderr, "Transformation failed\n");
xmlFreeDoc(doc);
xsltFreeStylesheet(style);
return -1;
}
// Save the result
FILE* output = fopen(outputPath, "w");
if (output != NULL) {
xsltSaveResultToFile(output, res, style);
fclose(output);
}
// Cleanup
xmlFreeDoc(res);
xmlFreeDoc(doc);
xsltFreeStylesheet(style);
xsltCleanupGlobals();
xmlCleanupParser();
return 0;
}
XQuery Processors
BaseX
High-performance XML database with excellent XQuery support.
Command Line Usage
# Start BaseX server
basex -s
# Execute XQuery file
basex -i input.xml query.xq
# XQuery with parameters
basex -b var1=value1 -b var2=value2 query.xq
# Create database and run query
basex -c "CREATE DATABASE mydb input.xml; XQUERY file:query.xq"
Java Integration
public class BaseXProcessor {
public void executeXQuery(String xmlPath, String xqueryPath, String outputPath)
throws Exception {
Context context = new Context();
try {
// Create database from XML file
new CreateDB("tempdb", xmlPath).execute(context);
// Read XQuery file
String query = Files.readString(Paths.get(xqueryPath));
// Execute query
try (QueryProcessor processor = new QueryProcessor(query, context)) {
String result = processor.execute().serialize().toString();
// Write result to file
Files.writeString(Paths.get(outputPath), result);
}
} finally {
// Cleanup
new DropDB("tempdb").execute(context);
context.close();
}
}
public void executeXQueryWithBinding(String xmlPath, String xqueryPath,
Map<String, String> variables) throws Exception {
Context context = new Context();
try {
new CreateDB("tempdb", xmlPath).execute(context);
String query = Files.readString(Paths.get(xqueryPath));
try (QueryProcessor processor = new QueryProcessor(query, context)) {
// Bind variables
variables.forEach((key, value) -> {
try {
processor.variable(key, value);
} catch (QueryException e) {
throw new RuntimeException(e);
}
});
Iter iter = processor.iter();
StringBuilder result = new StringBuilder();
for (Item item; (item = iter.next()) != null; ) {
result.append(item.serialize().toString()).append("\n");
}
System.out.println(result.toString());
}
} finally {
new DropDB("tempdb").execute(context);
context.close();
}
}
}
eXist-db
Native XML database with strong XQuery capabilities.
public class ExistDBProcessor {
private final String serverUrl;
private final String username;
private final String password;
public ExistDBProcessor(String serverUrl, String username, String password) {
this.serverUrl = serverUrl;
this.username = username;
this.password = password;
}
public void executeXQuery(String collection, String xquery) throws Exception {
// Initialize database driver
Class.forName("org.exist.xmldb.DatabaseImpl");
Database database = DatabaseManager.getDatabase("org.exist.xmldb.DatabaseImpl");
DatabaseManager.registerDatabase(database);
// Get collection
org.xmldb.api.base.Collection col =
DatabaseManager.getCollection(serverUrl + collection, username, password);
// Create XQuery service
XQueryService xqs = (XQueryService) col.getService("XQueryService", "1.0");
xqs.setProperty("indent", "yes");
// Execute query
ResourceSet result = xqs.query(xquery);
ResourceIterator i = result.getIterator();
while(i.hasMoreResources()) {
Resource r = i.nextResource();
System.out.println(r.getContent());
}
}
public void storeDocument(String collection, String docName, String xmlContent)
throws Exception {
Class.forName("org.exist.xmldb.DatabaseImpl");
Database database = DatabaseManager.getDatabase("org.exist.xmldb.DatabaseImpl");
DatabaseManager.registerDatabase(database);
org.xmldb.api.base.Collection col =
DatabaseManager.getCollection(serverUrl + collection, username, password);
XMLResource document = (XMLResource) col.createResource(docName, "XMLResource");
document.setContent(xmlContent);
col.storeResource(document);
}
}
Command-Line Tools
xmlstarlet
Powerful command-line XML toolkit.
# XSLT transformation
xmlstarlet tr transform.xsl input.xml > output.xml
# XPath queries
xmlstarlet sel -t -v "//product/@id" input.xml
# XML validation
xmlstarlet val -s schema.xsd input.xml
# XML formatting
xmlstarlet fo input.xml
# XML editing
xmlstarlet ed -u "//price" -v "29.99" input.xml
# Complex transformation with parameters
xmlstarlet tr --param category "electronics" transform.xsl input.xml
xsltproc
GNOME's XSLT processor.
# Basic transformation
xsltproc transform.xsl input.xml > output.xml
# With parameters
xsltproc --param category "'electronics'" transform.xsl input.xml
# Include path for imports
xsltproc --path /usr/share/xsl transform.xsl input.xml
# Profiling
xsltproc --profile transform.xsl input.xml 2> profile.txt
# Validation during transformation
xsltproc --valid transform.xsl input.xml
XML Pipeline Scripts
#!/bin/bash
# xml-pipeline.sh - Complex XML processing pipeline
INPUT_FILE=$1
OUTPUT_FILE=$2
TEMP_DIR="/tmp/xml-pipeline-$$"
# Create temporary directory
mkdir -p "$TEMP_DIR"
# Cleanup function
cleanup() {
rm -rf "$TEMP_DIR"
}
trap cleanup EXIT
# Stage 1: Validation
echo "Validating input..."
xmlstarlet val -s schema.xsd "$INPUT_FILE" || {
echo "Validation failed"
exit 1
}
# Stage 2: Normalize
echo "Normalizing..."
xmlstarlet tr normalize.xsl "$INPUT_FILE" > "$TEMP_DIR/normalized.xml"
# Stage 3: Transform
echo "Transforming..."
xsltproc --param timestamp "'$(date -Is)'" \
transform.xsl "$TEMP_DIR/normalized.xml" > "$TEMP_DIR/transformed.xml"
# Stage 4: Enrich
echo "Enriching..."
xmlstarlet tr enrich.xsl "$TEMP_DIR/transformed.xml" > "$TEMP_DIR/enriched.xml"
# Stage 5: Final formatting
echo "Final formatting..."
xmlstarlet fo "$TEMP_DIR/enriched.xml" > "$OUTPUT_FILE"
echo "Pipeline completed successfully: $OUTPUT_FILE"
Integration Frameworks
Apache Camel XML Processing
public class CamelXMLRoutes extends RouteBuilder {
@Override
public void configure() throws Exception {
// XSLT transformation route
from("file:input?noop=true")
.log("Processing file: ${header.CamelFileName}")
.to("xslt:transform.xsl")
.to("file:output");
// XQuery transformation route
from("file:xquery-input?noop=true")
.to("xquery:transform.xq")
.to("file:xquery-output");
// Conditional transformation based on content
from("file:conditional-input?noop=true")
.choice()
.when(xpath("//product[@category='electronics']"))
.to("xslt:electronics-transform.xsl")
.when(xpath("//product[@category='books']"))
.to("xslt:books-transform.xsl")
.otherwise()
.to("xslt:default-transform.xsl")
.end()
.to("file:conditional-output");
// Pipeline with multiple transformations
from("file:pipeline-input?noop=true")
.pipeline()
.to("xslt:normalize.xsl")
.to("xslt:enrich.xsl")
.to("xslt:format.xsl")
.end()
.to("file:pipeline-output");
// Error handling
from("file:error-input?noop=true")
.onException(Exception.class)
.handled(true)
.log("Transformation failed: ${exception.message}")
.to("file:error-output")
.end()
.to("xslt:risky-transform.xsl")
.to("file:success-output");
}
}
Spring Integration XML
<!-- Spring Integration XML Configuration -->
<int:channel id="xmlInputChannel"/>
<int:channel id="xmlOutputChannel"/>
<int:channel id="errorChannel"/>
<!-- File input -->
<int-file:inbound-channel-adapter
id="fileInput"
directory="input"
channel="xmlInputChannel"
auto-startup="true">
<int:poller fixed-delay="5000"/>
</int-file:inbound-channel-adapter>
<!-- XSLT transformer -->
<int-xml:xslt-transformer
id="xsltTransformer"
input-channel="xmlInputChannel"
output-channel="xmlOutputChannel"
xsl-resource="classpath:transform.xsl">
<int-xml:xslt-param name="timestamp" expression="new java.util.Date()"/>
</int-xml:xslt-transformer>
<!-- File output -->
<int-file:outbound-channel-adapter
id="fileOutput"
directory="output"
channel="xmlOutputChannel"/>
<!-- Error handling -->
<int:service-activator
input-channel="errorChannel"
ref="errorHandler"
method="handleError"/>
@Component
public class ErrorHandler {
public void handleError(Exception exception) {
log.error("XML transformation error: {}", exception.getMessage());
// Send to dead letter queue, notify administrators, etc.
}
}
Performance Optimization Tools
XML Profiling
public class XMLTransformationProfiler {
public void profileTransformation(String xmlPath, String xslPath) {
long startTime = System.nanoTime();
try {
// Memory usage before
Runtime runtime = Runtime.getRuntime();
long memoryBefore = runtime.totalMemory() - runtime.freeMemory();
// Perform transformation
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(new StreamSource(xslPath));
ByteArrayOutputStream output = new ByteArrayOutputStream();
transformer.transform(
new StreamSource(xmlPath),
new StreamResult(output)
);
// Memory usage after
long memoryAfter = runtime.totalMemory() - runtime.freeMemory();
long duration = System.nanoTime() - startTime;
System.out.printf("Transformation completed in %.2f ms%n", duration / 1_000_000.0);
System.out.printf("Memory used: %d KB%n", (memoryAfter - memoryBefore) / 1024);
System.out.printf("Output size: %d bytes%n", output.size());
} catch (Exception e) {
System.err.println("Transformation failed: " + e.getMessage());
}
}
}
Benchmarking Framework
public class XMLTransformationBenchmark {
@Benchmark
public void saxonTransformation(TestData data) throws Exception {
Processor processor = new Processor(false);
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable executable = compiler.compile(new StreamSource(data.xslPath));
XsltTransformer transformer = executable.load();
transformer.setSource(new StreamSource(new StringReader(data.xmlContent)));
ByteArrayOutputStream output = new ByteArrayOutputStream();
transformer.setDestination(processor.newSerializer(output));
transformer.transform();
}
@Benchmark
public void xalanTransformation(TestData data) throws Exception {
TransformerFactory factory = TransformerFactory.newInstance();
Transformer transformer = factory.newTransformer(new StreamSource(data.xslPath));
ByteArrayOutputStream output = new ByteArrayOutputStream();
transformer.transform(
new StreamSource(new StringReader(data.xmlContent)),
new StreamResult(output)
);
}
@Benchmark
public void baseXQuery(TestData data) throws Exception {
Context context = new Context();
try (QueryProcessor processor = new QueryProcessor(data.xqueryContent, context)) {
processor.execute();
}
context.close();
}
public static void main(String[] args) throws RunnerException {
Options opt = new OptionsBuilder()
.include(XMLTransformationBenchmark.class.getSimpleName())
.forks(1)
.warmupIterations(3)
.measurementIterations(5)
.mode(Mode.AverageTime)
.timeUnit(TimeUnit.MILLISECONDS)
.build();
new Runner(opt).run();
}
}
Continuous Integration Tools
GitHub Actions XML Pipeline
name: XML Processing Pipeline
on:
push:
paths:
- 'xml-data/**'
- 'transformations/**'
jobs:
xml-transformation:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Java
uses: actions/setup-java@v3
with:
java-version: '11'
distribution: 'temurin'
- name: Install XML tools
run: |
sudo apt-get update
sudo apt-get install -y xmlstarlet xsltproc
- name: Download Saxon
run: |
wget https://github.com/Saxonica/Saxon-HE/releases/download/SaxonHE12-3/SaxonHE12-3J.zip
unzip SaxonHE12-3J.zip
- name: Validate XML files
run: |
for file in xml-data/*.xml; do
xmlstarlet val -s schema.xsd "$file" || exit 1
done
- name: Transform XML files
run: |
mkdir -p output
for file in xml-data/*.xml; do
basename=$(basename "$file" .xml)
java -jar saxon-he-12.3.jar \
-s:"$file" \
-xsl:transformations/main.xsl \
-o:"output/${basename}-transformed.xml"
done
- name: Generate reports
run: |
java -jar saxon-he-12.3.jar \
-s:xml-data \
-xsl:transformations/report.xsl \
-o:output/report.html
- name: Upload artifacts
uses: actions/upload-artifact@v3
with:
name: transformed-xml
path: output/
Jenkins XML Pipeline
pipeline {
agent any
environment {
SAXON_JAR = '/opt/saxon/saxon-he-12.3.jar'
XML_SCHEMA = 'schemas/main.xsd'
}
stages {
stage('Checkout') {
steps {
git 'https://github.com/company/xml-data.git'
}
}
stage('Validate') {
steps {
script {
sh '''
for file in data/*.xml; do
echo "Validating $file"
xmlstarlet val -s ${XML_SCHEMA} "$file"
done
'''
}
}
}
stage('Transform') {
parallel {
stage('HTML Output') {
steps {
sh '''
mkdir -p output/html
for file in data/*.xml; do
basename=$(basename "$file" .xml)
java -jar ${SAXON_JAR} \
-s:"$file" \
-xsl:transforms/to-html.xsl \
-o:"output/html/${basename}.html"
done
'''
}
}
stage('JSON Output') {
steps {
sh '''
mkdir -p output/json
for file in data/*.xml; do
basename=$(basename "$file" .xml)
java -jar ${SAXON_JAR} \
-s:"$file" \
-xsl:transforms/to-json.xsl \
-o:"output/json/${basename}.json"
done
'''
}
}
}
}
stage('Quality Check') {
steps {
script {
sh '''
# Check output file sizes
find output -name "*.html" -size 0 | while read file; do
echo "Warning: Empty file $file"
done
# Validate HTML output
for file in output/html/*.html; do
tidy -q -e "$file" || echo "HTML validation warning for $file"
done
'''
}
}
}
stage('Deploy') {
when {
branch 'main'
}
steps {
sh '''
rsync -av output/ deploy@server:/var/www/xml-reports/
'''
}
}
}
post {
always {
archiveArtifacts artifacts: 'output/**/*', fingerprint: true
}
failure {
emailext (
subject: "XML Pipeline Failed: ${env.JOB_NAME} - ${env.BUILD_NUMBER}",
body: "Check console output at ${env.BUILD_URL}",
to: "[email protected]"
)
}
}
}
Cloud-Based Processing
AWS Lambda XML Processor
import boto3
import lxml.etree as etree
import json
def lambda_handler(event, context):
s3 = boto3.client('s3')
# Get XML file from S3
bucket = event['Records'][0]['s3']['bucket']['name']
key = event['Records'][0]['s3']['object']['key']
try:
# Download XML and XSLT
xml_obj = s3.get_object(Bucket=bucket, Key=key)
xml_content = xml_obj['Body'].read()
xslt_obj = s3.get_object(Bucket=bucket, Key='transforms/main.xsl')
xslt_content = xslt_obj['Body'].read()
# Parse documents
xml_doc = etree.fromstring(xml_content)
xslt_doc = etree.fromstring(xslt_content)
# Create transformer and apply
transform = etree.XSLT(xslt_doc)
result = transform(xml_doc)
# Upload result back to S3
output_key = key.replace('.xml', '-transformed.html')
s3.put_object(
Bucket=bucket,
Key=f'output/{output_key}',
Body=str(result),
ContentType='text/html'
)
return {
'statusCode': 200,
'body': json.dumps({
'message': f'Successfully transformed {key}',
'output': f'output/{output_key}'
})
}
except Exception as e:
print(f'Error processing {key}: {str(e)}')
return {
'statusCode': 500,
'body': json.dumps({
'error': str(e)
})
}
Tool Selection Guidelines
Performance Comparison
Processor | Language | XSLT Version | Performance | Memory Usage | Features |
---|---|---|---|---|---|
Saxon-EE | Java | 3.0 | Excellent | Low | Streaming, optimization |
Saxon-HE | Java | 2.0 | Very Good | Medium | Full standards support |
Xalan | Java | 1.0 | Good | High | Extensions, debugging |
libxslt | C | 1.0 | Excellent | Low | Native performance |
BaseX | Java | XQuery 3.1 | Excellent | Low | Database integration |
Use Case Recommendations
Large Document Processing
- Best: Saxon-EE with streaming
- Alternative: libxslt for C applications
- Avoid: DOM-based processors
Real-time Web Applications
- Best: Saxon-HE with compiled stylesheets
- Alternative: libxslt with mod_transform
- Consider: Caching transformed results
Batch Processing
- Best: Command-line tools (xsltproc, xmlstarlet)
- Alternative: Saxon command-line
- Consider: Parallel processing
Enterprise Integration
- Best: Apache Camel with Saxon
- Alternative: Spring Integration
- Consider: Error handling and monitoring
Conclusion
Choosing the right XML transformation tools depends on your specific requirements for performance, features, and integration needs. Saxon provides the most advanced XSLT/XQuery support, while command-line tools excel for batch processing. Consider factors like document size, transformation complexity, and deployment environment when making your selection.
Next Steps
- Review XSLT Transformation for stylesheet development
- Study XQuery for functional processing approaches
- Explore Advanced Techniques for complex scenarios