1. xml
  2. /advanced
  3. /dtd

Document Type Definition (DTD)

Document Type Definition (DTD) is one of the original methods for defining the structure and constraints of XML documents. While XML Schema (XSD) has largely superseded DTD for new applications, DTD remains important for understanding legacy systems and certain XML specifications that still rely on it.

DTD vs XML Schema

FeatureDTDXML Schema (XSD)
SyntaxNon-XML syntaxXML-based syntax
Data TypesLimited (CDATA, ID, IDREF, etc.)Rich type system
Namespace SupportLimitedFull support
ExtensibilityLimitedExcellent
Learning CurveSimplerMore complex

DTD Declaration

DTDs can be declared internally within an XML document or externally in separate files:

Internal DTD

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library [
    <!ELEMENT library (book+)>
    <!ELEMENT book (title, author, isbn)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT author (#PCDATA)>
    <!ELEMENT isbn (#PCDATA)>
    <!ATTLIST book id ID #REQUIRED>
]>
<library>
    <book id="book1">
        <title>Learning XML</title>
        <author>John Doe</author>
        <isbn>978-1234567890</isbn>
    </book>
</library>

External DTD

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "library.dtd">
<library>
    <!-- XML content -->
</library>

library.dtd:

<!ELEMENT library (book+)>
<!ELEMENT book (title, author, isbn)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)>
<!ELEMENT isbn (#PCDATA)>
<!ATTLIST book id ID #REQUIRED>

Element Declarations

Basic Element Syntax

<!ELEMENT element-name (content-model)>

Content Models

Empty Elements

<!ELEMENT br EMPTY>

Text Content Only

<!ELEMENT title (#PCDATA)>

Element Content Only

<!ELEMENT book (title, author, price)>

Mixed Content

<!ELEMENT description (#PCDATA | emphasis | strong)*>

Content Operators

Sequence (,)

Elements must appear in order:

<!ELEMENT book (title, author, isbn)>

Choice (|)

One of the elements must appear:

<!ELEMENT contact (email | phone | address)>

Occurrence Indicators

<!-- Exactly one (default) -->
<!ELEMENT book (title)>

<!-- Zero or one (optional) -->
<!ELEMENT book (title, subtitle?)>

<!-- Zero or more -->
<!ELEMENT book (title, author*)>

<!-- One or more -->
<!ELEMENT library (book+)>

Attribute Declarations

Basic Attribute Syntax

<!ATTLIST element-name
    attribute-name attribute-type default-value>

Attribute Types

CDATA

Any character data:

<!ATTLIST book title CDATA #IMPLIED>

ID and IDREF

Unique identifiers and references:

<!ATTLIST book id ID #REQUIRED>
<!ATTLIST reference bookref IDREF #REQUIRED>

NMTOKEN and NMTOKENS

Name tokens:

<!ATTLIST element class NMTOKEN #IMPLIED>
<!ATTLIST element classes NMTOKENS #IMPLIED>

Enumerated Values

<!ATTLIST book format (hardcover | paperback | ebook) "paperback">
<!ATTLIST book language (en | es | fr | de) #REQUIRED>

Default Value Types

#REQUIRED

Attribute must be present:

<!ATTLIST book id ID #REQUIRED>

#IMPLIED

Attribute is optional:

<!ATTLIST book subtitle CDATA #IMPLIED>

#FIXED

Attribute has a fixed value:

<!ATTLIST book version CDATA #FIXED "1.0">

Default Value

Provides a default if not specified:

<!ATTLIST book language CDATA "en">

Complete DTD Example

books.dtd:

<?xml version="1.0" encoding="UTF-8"?>

<!-- Root element -->
<!ELEMENT library (metadata?, book+, author*)>

<!-- Metadata element -->
<!ELEMENT metadata (name, location, established?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT location (#PCDATA)>
<!ELEMENT established (#PCDATA)>

<!-- Book element -->
<!ELEMENT book (title, author-ref+, isbn, publisher?, price?, description?)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author-ref EMPTY>
<!ELEMENT isbn (#PCDATA)>
<!ELEMENT publisher (#PCDATA)>
<!ELEMENT price (#PCDATA)>
<!ELEMENT description (#PCDATA | emphasis | strong)*>

<!-- Text formatting elements -->
<!ELEMENT emphasis (#PCDATA)>
<!ELEMENT strong (#PCDATA)>

<!-- Author element -->
<!ELEMENT author (name, biography?)>
<!ELEMENT biography (#PCDATA)>

<!-- Attributes -->
<!ATTLIST library
    id ID #REQUIRED
    version CDATA #FIXED "2.0">

<!ATTLIST book
    id ID #REQUIRED
    genre (fiction | non-fiction | science | history | biography) #REQUIRED
    format (hardcover | paperback | ebook) "paperback"
    available (yes | no) "yes">

<!ATTLIST author-ref
    ref IDREF #REQUIRED>

<!ATTLIST author
    id ID #REQUIRED
    nationality CDATA #IMPLIED>

<!ATTLIST price
    currency (USD | EUR | GBP | JPY) "USD">

Sample XML Document:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library SYSTEM "books.dtd">
<library id="central-lib" version="2.0">
    <metadata>
        <name>Central Public Library</name>
        <location>Downtown</location>
        <established>1952</established>
    </metadata>
    
    <book id="book1" genre="fiction" format="hardcover">
        <title>To Kill a Mockingbird</title>
        <author-ref ref="author1"/>
        <isbn>978-0-06-112008-4</isbn>
        <publisher>Harper Lee</publisher>
        <price currency="USD">14.99</price>
        <description>A classic novel exploring themes of 
        <emphasis>racial injustice</emphasis> and 
        <strong>moral growth</strong>.</description>
    </book>
    
    <book id="book2" genre="science">
        <title>A Brief History of Time</title>
        <author-ref ref="author2"/>
        <isbn>978-0-553-10953-5</isbn>
        <price currency="USD">16.99</price>
    </book>
    
    <author id="author1" nationality="American">
        <name>Harper Lee</name>
        <biography>American novelist known for To Kill a Mockingbird.</biography>
    </author>
    
    <author id="author2" nationality="British">
        <name>Stephen Hawking</name>
        <biography>Theoretical physicist and cosmologist.</biography>
    </author>
</library>

Entity Declarations

Internal Entities

<!ENTITY copyright "Copyright 2023 Library Systems Inc.">
<!ENTITY contact-email "[email protected]">

Usage in XML:

<footer>&copyright; Contact: &contact-email;</footer>

External Entities

<!ENTITY legal SYSTEM "legal-notice.txt">
<!ENTITY logo SYSTEM "logo.gif" NDATA gif>

Parameter Entities

Used within DTD for modularity:

<!ENTITY % text-elements "emphasis | strong | code">
<!ELEMENT description (#PCDATA | %text-elements;)*>

Conditional Sections

For creating modular DTDs:

<!ENTITY % include-images "INCLUDE">

<![%include-images;[
    <!ELEMENT image EMPTY>
    <!ATTLIST image 
        src CDATA #REQUIRED
        alt CDATA #REQUIRED>
]]>

DTD Validation

Command Line Validation

# Using xmllint
xmllint --valid --noout document.xml

# Using xmlstarlet
xmlstarlet val -d books.dtd library.xml

Programmatic Validation

Java

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();

builder.setErrorHandler(new ErrorHandler() {
    public void error(SAXParseException e) throws SAXException {
        System.err.println("Validation error: " + e.getMessage());
    }
    // ... other methods
});

Document doc = builder.parse(new File("library.xml"));

Python (lxml)

from lxml import etree

# Parse DTD
with open('books.dtd', 'r') as dtd_file:
    dtd = etree.DTD(dtd_file)

# Parse XML
with open('library.xml', 'r') as xml_file:
    xml_doc = etree.parse(xml_file)

# Validate
if dtd.validate(xml_doc):
    print("Document is valid")
else:
    print("Validation errors:")
    for error in dtd.error_log:
        print(f"Line {error.line}: {error.message}")

Best Practices

DTD Design

  • Keep it simple: DTD syntax is limited, don't over-complicate
  • Use meaningful names: Element and attribute names should be descriptive
  • Document structure: Add comments to explain complex content models
  • Modularize: Use parameter entities for reusable components

Performance Considerations

  • External DTDs: Cache DTD files to avoid repeated network requests
  • Entity references: Minimize complex entity structures
  • Validation timing: Consider when validation is necessary

Migration Strategy

If moving from DTD to XML Schema:

<!-- DTD version -->
<!ELEMENT book (title, author)>
<!ATTLIST book id ID #REQUIRED>
<!-- XML Schema equivalent -->
<xs:element name="book">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="title" type="xs:string"/>
            <xs:element name="author" type="xs:string"/>
        </xs:sequence>
        <xs:attribute name="id" type="xs:ID" use="required"/>
    </xs:complexType>
</xs:element>

Common Pitfalls

Mixed Content Issues

<!-- Problematic: Hard to control -->
<!ELEMENT description (#PCDATA | emphasis | strong)*>

<!-- Better: More structured -->
<!ELEMENT description (paragraph+)>
<!ELEMENT paragraph (#PCDATA | emphasis | strong)*>

ID/IDREF Constraints

<!-- Remember: ID values must be unique across entire document -->
<!ATTLIST book id ID #REQUIRED>
<!ATTLIST author id ID #REQUIRED>  <!-- Must not conflict with book IDs -->

Case Sensitivity

<!-- DTD is case-sensitive -->
<!ELEMENT Book (title)>    <!-- Different from book -->
<!ELEMENT book (title)>

When to Use DTD

Still Appropriate For:

  • Legacy system maintenance
  • Simple document structures
  • SGML compatibility requirements
  • Quick prototyping

Consider XML Schema Instead For:

  • New projects
  • Complex data validation
  • Namespace-heavy documents
  • Rich data type requirements

Conclusion

While DTD is considered legacy technology, understanding it remains valuable for working with existing XML systems and comprehending XML's evolution. For new projects, XML Schema (XSD) typically provides better features and flexibility.

Next Steps