Introduction to RELAX NG (1/2) - exploring XML | WebReference

Introduction to RELAX NG (1/2) - exploring XML

Introduction to RELAX NG

In the last installment we discussed the different approaches to schema definition put forward by the W3C and OASIS. More specifically, we followed the criticism surrounding XML Schema, and looked at some improvements offered in the alternative, RELAX NG. Today we'll explain the basics of RELAX NG by example.

A simple RELAX NG example

Let's assume we want to specify a person's name in XML, consisting of first name, last name and middle initial, such as:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE name SYSTEM "name.dtd">
<name bidi="fml">

The bidirectional atribute bidi shall indicate in which order the name parts are usually written in the person's culture, so that "lastname first" is also supported. This document is a valid instance of the following DTD:

<!ELEMENT name (first, mi, last)>
<!ELEMENT first (#PCDATA)>

This DTD declares name to consist of first, mi, and last. An optional attribute bidi can be attached to the name element. Since DTDs do not support data types, the remaining elements need to be declared as character data.

The aformenetioned data structure also matches the following RELAX NG schema in XML syntax:

<?xml version="1.0" encoding="UTF-8"?>
<element name="name"
  <attribute name="bidi"/>
 <element name="first"><text/></element>
 <element name="mi"><text/></element>
 <element name="last"><text/></element>

Elements are defined using the element tag with the mandatory attribute name. The top-level element also defines the root element for all XML documents following this schema. The attribute element defines the occurrence of the specified name on the enclosing element. In our case, the bidi attribute is flagged as optional by putting it inside an optional element.

There is also non-XML syntax put forward, which resembles the EBNF notation used in grammar definition. The compact syntax of the same schema is:

element name { attribute bidi { text }?,
element first { text },
element mi { text }, 
element last { text } }

Here the start and end tags have been replaced with braces, sequence is expressed through the use of commata, and the optional is now symbolized by the question mark. This syntax needs much fewer characters and is easier to read for humans, while it can still be processed by machines.

More features of RELAX NG...

Produced by Michael Claßen

Created: Jul 22, 2002
Revised: Jul 22, 2002