XML Schema (XSD)
XML Schema Definition (XSD) is a World Wide Web Consortium (W3C) recommendation that defines how to formally describe the elements in an XML document. It's more powerful and flexible than DTD (Document Type Definition) and provides a rich set of data types and constraints for XML validation.
Why Use XML Schema?
XSD offers several advantages over DTD:
- Rich Data Types: Built-in primitive types (string, integer, date, etc.) and ability to create custom types
- Namespace Support: Full integration with XML namespaces
- Content Model Control: Precise control over element order, occurrence, and relationships
- XML Syntax: Written in XML, making it easier to process and manipulate
- Extensibility: Support for inheritance and type derivation
Basic Schema Structure
Every XML Schema document has this basic structure:
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/book"
xmlns:book="http://example.com/book"
elementFormDefault="qualified">
<!-- Schema content goes here -->
</xs:schema>
Key Attributes Explained:
xmlns:xs
: Declares the XML Schema namespacetargetNamespace
: The namespace this schema defineselementFormDefault
: Whether elements must be namespace-qualified
Simple Elements
Define basic elements with built-in data types:
<!-- Simple string element -->
<xs:element name="title" type="xs:string"/>
<!-- Element with restrictions -->
<xs:element name="age">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="0"/>
<xs:maxInclusive value="120"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
<!-- Element with default value -->
<xs:element name="country" type="xs:string" default="USA"/>
<!-- Optional element -->
<xs:element name="subtitle" type="xs:string" minOccurs="0"/>
Built-in Data Types
XSD provides numerous built-in types:
Primitive Types
<xs:element name="title" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
<xs:element name="publishDate" type="xs:date"/>
<xs:element name="available" type="xs:boolean"/>
<xs:element name="rating" type="xs:float"/>
Date and Time Types
<xs:element name="publishDate" type="xs:date"/> <!-- 2023-07-10 -->
<xs:element name="timestamp" type="xs:dateTime"/> <!-- 2023-07-10T14:30:00 -->
<xs:element name="time" type="xs:time"/> <!-- 14:30:00 -->
<xs:element name="year" type="xs:gYear"/> <!-- 2023 -->
Complex Types
Define elements that contain other elements or attributes:
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string" maxOccurs="unbounded"/>
<xs:element name="isbn" type="xs:string"/>
<xs:element name="publishDate" type="xs:date"/>
<xs:element name="price" type="xs:decimal" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="genre" type="xs:string"/>
</xs:complexType>
<!-- Using the complex type -->
<xs:element name="book" type="BookType"/>
Content Models
Sequence (Ordered Elements)
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="xs:string"/>
<xs:element name="zipCode" type="xs:string"/>
</xs:sequence>
</xs:complexType>
Choice (Alternative Elements)
<xs:complexType name="ContactType">
<xs:choice>
<xs:element name="phone" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
<xs:element name="address" type="AddressType"/>
</xs:choice>
</xs:complexType>
All (Unordered Elements)
<xs:complexType name="PersonType">
<xs:all>
<xs:element name="firstName" type="xs:string"/>
<xs:element name="lastName" type="xs:string"/>
<xs:element name="age" type="xs:integer" minOccurs="0"/>
</xs:all>
</xs:complexType>
Restrictions and Facets
Apply constraints to simple types:
String Restrictions
<xs:simpleType name="ISBNType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{10}"/> <!-- Pattern constraint -->
<xs:length value="14"/> <!-- Exact length -->
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ShortStringType">
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
<xs:maxLength value="50"/>
</xs:restriction>
</xs:simpleType>
Numeric Restrictions
<xs:simpleType name="GradeType">
<xs:restriction base="xs:decimal">
<xs:minInclusive value="0.0"/>
<xs:maxInclusive value="100.0"/>
<xs:fractionDigits value="1"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="QuantityType">
<xs:restriction base="xs:integer">
<xs:minExclusive value="0"/>
<xs:maxInclusive value="1000"/>
</xs:restriction>
</xs:simpleType>
Enumeration
<xs:simpleType name="GenreType">
<xs:restriction base="xs:string">
<xs:enumeration value="Fiction"/>
<xs:enumeration value="Non-Fiction"/>
<xs:enumeration value="Science Fiction"/>
<xs:enumeration value="Mystery"/>
<xs:enumeration value="Romance"/>
</xs:restriction>
</xs:simpleType>
Attributes
Define and constrain attributes:
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
</xs:sequence>
<!-- Required attribute -->
<xs:attribute name="id" type="xs:ID" use="required"/>
<!-- Optional attribute with default -->
<xs:attribute name="format" type="xs:string" default="paperback"/>
<!-- Restricted attribute -->
<xs:attribute name="status">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="available"/>
<xs:enumeration value="out-of-print"/>
<xs:enumeration value="coming-soon"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
Type Derivation
Extension (Adding Content)
<xs:complexType name="PublicationBase">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="publishDate" type="xs:date"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="BookType">
<xs:complexContent>
<xs:extension base="PublicationBase">
<xs:sequence>
<xs:element name="isbn" type="xs:string"/>
<xs:element name="pages" type="xs:integer"/>
</xs:sequence>
<xs:attribute name="hardcover" type="xs:boolean"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Restriction (Limiting Content)
<xs:complexType name="ShortBookType">
<xs:complexContent>
<xs:restriction base="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="publishDate" type="xs:date"/>
<xs:element name="isbn" type="xs:string"/>
<xs:element name="pages">
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:maxInclusive value="200"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
</xs:sequence>
</xs:restriction>
</xs:complexContent>
</xs:complexType>
Complete Example: Library Schema
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://example.com/library"
xmlns:lib="http://example.com/library"
elementFormDefault="qualified">
<!-- Root element -->
<xs:element name="library" type="lib:LibraryType"/>
<!-- Library type -->
<xs:complexType name="LibraryType">
<xs:sequence>
<xs:element name="name" type="xs:string"/>
<xs:element name="address" type="lib:AddressType"/>
<xs:element name="book" type="lib:BookType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
<!-- Address type -->
<xs:complexType name="AddressType">
<xs:sequence>
<xs:element name="street" type="xs:string"/>
<xs:element name="city" type="xs:string"/>
<xs:element name="state" type="lib:StateCodeType"/>
<xs:element name="zipCode" type="lib:ZipCodeType"/>
</xs:sequence>
</xs:complexType>
<!-- Book type -->
<xs:complexType name="BookType">
<xs:sequence>
<xs:element name="title" type="xs:string"/>
<xs:element name="author" type="xs:string"
maxOccurs="unbounded"/>
<xs:element name="isbn" type="lib:ISBNType"/>
<xs:element name="publishDate" type="xs:date"/>
<xs:element name="genre" type="lib:GenreType"/>
<xs:element name="price" type="xs:decimal" minOccurs="0"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
<xs:attribute name="available" type="xs:boolean" default="true"/>
</xs:complexType>
<!-- Custom types -->
<xs:simpleType name="ISBNType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{3}-[0-9]{10}"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="StateCodeType">
<xs:restriction base="xs:string">
<xs:pattern value="[A-Z]{2}"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="ZipCodeType">
<xs:restriction base="xs:string">
<xs:pattern value="[0-9]{5}(-[0-9]{4})?"/>
</xs:restriction>
</xs:simpleType>
<xs:simpleType name="GenreType">
<xs:restriction base="xs:string">
<xs:enumeration value="Fiction"/>
<xs:enumeration value="Non-Fiction"/>
<xs:enumeration value="Science Fiction"/>
<xs:enumeration value="Mystery"/>
<xs:enumeration value="Romance"/>
<xs:enumeration value="Biography"/>
</xs:restriction>
</xs:simpleType>
</xs:schema>
Validation Example
XML document that conforms to the schema:
<?xml version="1.0" encoding="UTF-8"?>
<lib:library xmlns:lib="http://example.com/library"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://example.com/library library.xsd"
id="lib001">
<lib:name>Central Public Library</lib:name>
<lib:address>
<lib:street>123 Main Street</lib:street>
<lib:city>Springfield</lib:city>
<lib:state>IL</lib:state>
<lib:zipCode>62701</lib:zipCode>
</lib:address>
<lib:book id="book001" available="true">
<lib:title>The Great Gatsby</lib:title>
<lib:author>F. Scott Fitzgerald</lib:author>
<lib:isbn>978-0743273565</lib:isbn>
<lib:publishDate>1925-04-10</lib:publishDate>
<lib:genre>Fiction</lib:genre>
<lib:price>12.99</lib:price>
</lib:book>
</lib:library>
Schema Documentation
Add documentation to your schemas:
<xs:element name="book" type="BookType">
<xs:annotation>
<xs:documentation>
Represents a book in the library catalog.
Each book must have a unique ID and title.
</xs:documentation>
</xs:annotation>
</xs:element>
Best Practices
Design Principles
- Use meaningful names for types and elements
- Create reusable types for common patterns
- Document your schema with annotations
- Use appropriate constraints but don't over-constrain
- Plan for extensibility using extension mechanisms
Performance Considerations
<!-- Efficient: Use specific constraints -->
<xs:element name="id" type="xs:ID"/>
<!-- Less efficient: Overly broad types -->
<xs:element name="id" type="xs:string"/>
Versioning Strategy
<!-- Include version in namespace -->
<xs:schema targetNamespace="http://example.com/library/v2.0">
<!-- Schema content -->
</xs:schema>
Common Patterns
Optional Elements Group
<xs:group name="ContactGroup">
<xs:choice>
<xs:element name="phone" type="xs:string"/>
<xs:element name="email" type="xs:string"/>
</xs:choice>
</xs:group>
Substitution Groups
<xs:element name="publication" type="PublicationType"/>
<xs:element name="book" type="BookType" substitutionGroup="publication"/>
<xs:element name="magazine" type="MagazineType" substitutionGroup="publication"/>
Validation Tools
Command Line (xmllint)
xmllint --schema library.xsd library.xml --noout
Programming Languages
Java
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File("library.xsd"));
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File("library.xml")));
Python (lxml)
from lxml import etree
with open('library.xsd', 'r') as schema_file:
schema_doc = etree.parse(schema_file)
schema = etree.XMLSchema(schema_doc)
with open('library.xml', 'r') as xml_file:
xml_doc = etree.parse(xml_file)
is_valid = schema.validate(xml_doc)
Conclusion
XML Schema (XSD) provides a powerful, flexible way to define the structure and constraints of XML documents. Its rich type system, namespace integration, and extensibility features make it the preferred choice for serious XML applications. Mastering XSD is essential for creating robust, validated XML systems.
Next Steps
- Explore XPath for querying XML documents
- Learn XSLT for transforming XML
- Study XML Namespaces for modular schemas