1. xml
  2. /advanced
  3. /xml-schema

XML Schema (XSD)

XML Schema Definition (XSD) is a World Wide Web Consortium (W3C) recommendation that defines how to formally describe the elements in an XML document. It's more powerful and flexible than DTD (Document Type Definition) and provides a rich set of data types and constraints for XML validation.

Why Use XML Schema?

XSD offers several advantages over DTD:

  • Rich Data Types: Built-in primitive types (string, integer, date, etc.) and ability to create custom types
  • Namespace Support: Full integration with XML namespaces
  • Content Model Control: Precise control over element order, occurrence, and relationships
  • XML Syntax: Written in XML, making it easier to process and manipulate
  • Extensibility: Support for inheritance and type derivation

Basic Schema Structure

Every XML Schema document has this basic structure:

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.com/book"
           xmlns:book="http://example.com/book"
           elementFormDefault="qualified">
    
    <!-- Schema content goes here -->
    
</xs:schema>

Key Attributes Explained:

  • xmlns:xs: Declares the XML Schema namespace
  • targetNamespace: The namespace this schema defines
  • elementFormDefault: Whether elements must be namespace-qualified

Simple Elements

Define basic elements with built-in data types:

<!-- Simple string element -->
<xs:element name="title" type="xs:string"/>

<!-- Element with restrictions -->
<xs:element name="age">
    <xs:simpleType>
        <xs:restriction base="xs:integer">
            <xs:minInclusive value="0"/>
            <xs:maxInclusive value="120"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

<!-- Element with default value -->
<xs:element name="country" type="xs:string" default="USA"/>

<!-- Optional element -->
<xs:element name="subtitle" type="xs:string" minOccurs="0"/>

Built-in Data Types

XSD provides numerous built-in types:

Primitive Types

<xs:element name="title" type="xs:string"/>
<xs:element name="price" type="xs:decimal"/>
<xs:element name="publishDate" type="xs:date"/>
<xs:element name="available" type="xs:boolean"/>
<xs:element name="rating" type="xs:float"/>

Date and Time Types

<xs:element name="publishDate" type="xs:date"/>         <!-- 2023-07-10 -->
<xs:element name="timestamp" type="xs:dateTime"/>       <!-- 2023-07-10T14:30:00 -->
<xs:element name="time" type="xs:time"/>                <!-- 14:30:00 -->
<xs:element name="year" type="xs:gYear"/>               <!-- 2023 -->

Complex Types

Define elements that contain other elements or attributes:

<xs:complexType name="BookType">
    <xs:sequence>
        <xs:element name="title" type="xs:string"/>
        <xs:element name="author" type="xs:string" maxOccurs="unbounded"/>
        <xs:element name="isbn" type="xs:string"/>
        <xs:element name="publishDate" type="xs:date"/>
        <xs:element name="price" type="xs:decimal" minOccurs="0"/>
    </xs:sequence>
    <xs:attribute name="id" type="xs:ID" use="required"/>
    <xs:attribute name="genre" type="xs:string"/>
</xs:complexType>

<!-- Using the complex type -->
<xs:element name="book" type="BookType"/>

Content Models

Sequence (Ordered Elements)

<xs:complexType name="AddressType">
    <xs:sequence>
        <xs:element name="street" type="xs:string"/>
        <xs:element name="city" type="xs:string"/>
        <xs:element name="state" type="xs:string"/>
        <xs:element name="zipCode" type="xs:string"/>
    </xs:sequence>
</xs:complexType>

Choice (Alternative Elements)

<xs:complexType name="ContactType">
    <xs:choice>
        <xs:element name="phone" type="xs:string"/>
        <xs:element name="email" type="xs:string"/>
        <xs:element name="address" type="AddressType"/>
    </xs:choice>
</xs:complexType>

All (Unordered Elements)

<xs:complexType name="PersonType">
    <xs:all>
        <xs:element name="firstName" type="xs:string"/>
        <xs:element name="lastName" type="xs:string"/>
        <xs:element name="age" type="xs:integer" minOccurs="0"/>
    </xs:all>
</xs:complexType>

Restrictions and Facets

Apply constraints to simple types:

String Restrictions

<xs:simpleType name="ISBNType">
    <xs:restriction base="xs:string">
        <xs:pattern value="[0-9]{3}-[0-9]{10}"/>  <!-- Pattern constraint -->
        <xs:length value="14"/>                    <!-- Exact length -->
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="ShortStringType">
    <xs:restriction base="xs:string">
        <xs:minLength value="1"/>
        <xs:maxLength value="50"/>
    </xs:restriction>
</xs:simpleType>

Numeric Restrictions

<xs:simpleType name="GradeType">
    <xs:restriction base="xs:decimal">
        <xs:minInclusive value="0.0"/>
        <xs:maxInclusive value="100.0"/>
        <xs:fractionDigits value="1"/>
    </xs:restriction>
</xs:simpleType>

<xs:simpleType name="QuantityType">
    <xs:restriction base="xs:integer">
        <xs:minExclusive value="0"/>
        <xs:maxInclusive value="1000"/>
    </xs:restriction>
</xs:simpleType>

Enumeration

<xs:simpleType name="GenreType">
    <xs:restriction base="xs:string">
        <xs:enumeration value="Fiction"/>
        <xs:enumeration value="Non-Fiction"/>
        <xs:enumeration value="Science Fiction"/>
        <xs:enumeration value="Mystery"/>
        <xs:enumeration value="Romance"/>
    </xs:restriction>
</xs:simpleType>

Attributes

Define and constrain attributes:

<xs:complexType name="BookType">
    <xs:sequence>
        <xs:element name="title" type="xs:string"/>
    </xs:sequence>
    
    <!-- Required attribute -->
    <xs:attribute name="id" type="xs:ID" use="required"/>
    
    <!-- Optional attribute with default -->
    <xs:attribute name="format" type="xs:string" default="paperback"/>
    
    <!-- Restricted attribute -->
    <xs:attribute name="status">
        <xs:simpleType>
            <xs:restriction base="xs:string">
                <xs:enumeration value="available"/>
                <xs:enumeration value="out-of-print"/>
                <xs:enumeration value="coming-soon"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:attribute>
</xs:complexType>

Type Derivation

Extension (Adding Content)

<xs:complexType name="PublicationBase">
    <xs:sequence>
        <xs:element name="title" type="xs:string"/>
        <xs:element name="publishDate" type="xs:date"/>
    </xs:sequence>
</xs:complexType>

<xs:complexType name="BookType">
    <xs:complexContent>
        <xs:extension base="PublicationBase">
            <xs:sequence>
                <xs:element name="isbn" type="xs:string"/>
                <xs:element name="pages" type="xs:integer"/>
            </xs:sequence>
            <xs:attribute name="hardcover" type="xs:boolean"/>
        </xs:extension>
    </xs:complexContent>
</xs:complexType>

Restriction (Limiting Content)

<xs:complexType name="ShortBookType">
    <xs:complexContent>
        <xs:restriction base="BookType">
            <xs:sequence>
                <xs:element name="title" type="xs:string"/>
                <xs:element name="publishDate" type="xs:date"/>
                <xs:element name="isbn" type="xs:string"/>
                <xs:element name="pages">
                    <xs:simpleType>
                        <xs:restriction base="xs:integer">
                            <xs:maxInclusive value="200"/>
                        </xs:restriction>
                    </xs:simpleType>
                </xs:element>
            </xs:sequence>
        </xs:restriction>
    </xs:complexContent>
</xs:complexType>

Complete Example: Library Schema

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
           targetNamespace="http://example.com/library"
           xmlns:lib="http://example.com/library"
           elementFormDefault="qualified">

    <!-- Root element -->
    <xs:element name="library" type="lib:LibraryType"/>

    <!-- Library type -->
    <xs:complexType name="LibraryType">
        <xs:sequence>
            <xs:element name="name" type="xs:string"/>
            <xs:element name="address" type="lib:AddressType"/>
            <xs:element name="book" type="lib:BookType" 
                       minOccurs="0" maxOccurs="unbounded"/>
        </xs:sequence>
        <xs:attribute name="id" type="xs:ID" use="required"/>
    </xs:complexType>

    <!-- Address type -->
    <xs:complexType name="AddressType">
        <xs:sequence>
            <xs:element name="street" type="xs:string"/>
            <xs:element name="city" type="xs:string"/>
            <xs:element name="state" type="lib:StateCodeType"/>
            <xs:element name="zipCode" type="lib:ZipCodeType"/>
        </xs:sequence>
    </xs:complexType>

    <!-- Book type -->
    <xs:complexType name="BookType">
        <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="author" type="xs:string" 
                       maxOccurs="unbounded"/>
            <xs:element name="isbn" type="lib:ISBNType"/>
            <xs:element name="publishDate" type="xs:date"/>
            <xs:element name="genre" type="lib:GenreType"/>
            <xs:element name="price" type="xs:decimal" minOccurs="0"/>
        </xs:sequence>
        <xs:attribute name="id" type="xs:ID" use="required"/>
        <xs:attribute name="available" type="xs:boolean" default="true"/>
    </xs:complexType>

    <!-- Custom types -->
    <xs:simpleType name="ISBNType">
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{3}-[0-9]{10}"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="StateCodeType">
        <xs:restriction base="xs:string">
            <xs:pattern value="[A-Z]{2}"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="ZipCodeType">
        <xs:restriction base="xs:string">
            <xs:pattern value="[0-9]{5}(-[0-9]{4})?"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:simpleType name="GenreType">
        <xs:restriction base="xs:string">
            <xs:enumeration value="Fiction"/>
            <xs:enumeration value="Non-Fiction"/>
            <xs:enumeration value="Science Fiction"/>
            <xs:enumeration value="Mystery"/>
            <xs:enumeration value="Romance"/>
            <xs:enumeration value="Biography"/>
        </xs:restriction>
    </xs:simpleType>

</xs:schema>

Validation Example

XML document that conforms to the schema:

<?xml version="1.0" encoding="UTF-8"?>
<lib:library xmlns:lib="http://example.com/library"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://example.com/library library.xsd"
             id="lib001">
    
    <lib:name>Central Public Library</lib:name>
    <lib:address>
        <lib:street>123 Main Street</lib:street>
        <lib:city>Springfield</lib:city>
        <lib:state>IL</lib:state>
        <lib:zipCode>62701</lib:zipCode>
    </lib:address>
    
    <lib:book id="book001" available="true">
        <lib:title>The Great Gatsby</lib:title>
        <lib:author>F. Scott Fitzgerald</lib:author>
        <lib:isbn>978-0743273565</lib:isbn>
        <lib:publishDate>1925-04-10</lib:publishDate>
        <lib:genre>Fiction</lib:genre>
        <lib:price>12.99</lib:price>
    </lib:book>
    
</lib:library>

Schema Documentation

Add documentation to your schemas:

<xs:element name="book" type="BookType">
    <xs:annotation>
        <xs:documentation>
            Represents a book in the library catalog.
            Each book must have a unique ID and title.
        </xs:documentation>
    </xs:annotation>
</xs:element>

Best Practices

Design Principles

  1. Use meaningful names for types and elements
  2. Create reusable types for common patterns
  3. Document your schema with annotations
  4. Use appropriate constraints but don't over-constrain
  5. Plan for extensibility using extension mechanisms

Performance Considerations

<!-- Efficient: Use specific constraints -->
<xs:element name="id" type="xs:ID"/>

<!-- Less efficient: Overly broad types -->
<xs:element name="id" type="xs:string"/>

Versioning Strategy

<!-- Include version in namespace -->
<xs:schema targetNamespace="http://example.com/library/v2.0">
    <!-- Schema content -->
</xs:schema>

Common Patterns

Optional Elements Group

<xs:group name="ContactGroup">
    <xs:choice>
        <xs:element name="phone" type="xs:string"/>
        <xs:element name="email" type="xs:string"/>
    </xs:choice>
</xs:group>

Substitution Groups

<xs:element name="publication" type="PublicationType"/>
<xs:element name="book" type="BookType" substitutionGroup="publication"/>
<xs:element name="magazine" type="MagazineType" substitutionGroup="publication"/>

Validation Tools

Command Line (xmllint)

xmllint --schema library.xsd library.xml --noout

Programming Languages

Java

SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File("library.xsd"));
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new File("library.xml")));

Python (lxml)

from lxml import etree

with open('library.xsd', 'r') as schema_file:
    schema_doc = etree.parse(schema_file)
    schema = etree.XMLSchema(schema_doc)

with open('library.xml', 'r') as xml_file:
    xml_doc = etree.parse(xml_file)
    
is_valid = schema.validate(xml_doc)

Conclusion

XML Schema (XSD) provides a powerful, flexible way to define the structure and constraints of XML documents. Its rich type system, namespace integration, and extensibility features make it the preferred choice for serious XML applications. Mastering XSD is essential for creating robust, validated XML systems.

Next Steps