Schema Wars: XML Schema vs. RELAX NG (1/2) - exploring XML | WebReference

Schema Wars: XML Schema vs. RELAX NG (1/2) - exploring XML

Schema Wars: XML Schema vs. RELAX NG

Lately some controversy has arisen in the XML community about what is the most appropriate tool to supercede the dreaded DTDs: XML Schema or RELAX NG? The IETF is creating an RFC on the use of XML, and the topic of specifying schemas is one point of contention.

XML Schema was put forward by the W3C in 2001 to fix the most obvious limitations of DTDs:

  1. The syntax of a DTD is different from XML, requiring the document writer to learn yet another notation, and the software to have yet another parser
  2. There is no way to specify datatypes and data formats that could be used to automatically map from and to programming languages
  3. There is not a set of well-known basic elements to choose from

While XML Schema successfully tackled these problems, it also went much further. XML Schema also took on data type definitions, infoset modification, and schema expressiveness far beyond that of DTDs.

James Clark, leader of the technical committee at OASIS for RELAX NG, and author of one of the first XML parsers, recently described the problems of the XML Schema language in a newsgroup posting:

  1. XML Schema definitions require considerable expertise to understand and can contain quite a few surprises.
    As an example, if you derive a complex type by restriction you have to specify the new restricted content model explicitly. However, attributes are treated in the opposite way: by default you get all the attributes and you have to explicitly rule out the ones you don't want. A similar inconsistency exists in that if you merge two attribute definitions you get the union of the concrete attribute definitions, but the intersection of attribute wildcards (specified by the anyAttribute element). While these might be convenient choices for the specification, it easily creates confusion for the human reader of XML schemas.
  2. The XML Schema Recommendation is hard to read and understand.
    To avoid the possible misinterpretations mentioned above you might have to reference the specification in order to fully understand a specific schema definition. I have to agree with Clark that the W3C's XML Schema Recommendation is by far the hardest to read and understand, making it even more difficult to make sense of a particular schema presented to you.
  3. W3C XML Schema's support for attributes provides no advance over DTDs.
    As with DTDs, W3C XML Schema only allows the specification of whether attributes are required or optional. There is no way to specify more complex constraints between attributes or between attributes or elements, for instance that either attribute X or attribute Y is allowed or that either attribute X or element Y is allowed. The mechanism that is used to constrain the co-occurrence of child elements should be extended to attributes and the combinations of attributes and child elements.
  4. W3C XML Schema provides very weak support for unordered content.
    When the designer of an XML vocabulary does not wish to force child elements to occur in a particular order, it can be impractical to describe the XML vocabulary using XML Schema, because XML Schema imposes such limitations.

More problems ahead...

Produced by Michael Claßen

Created: Jul 08, 2002
Revised: Jul 08, 2002