dtddoc step 2: Internal data structures (1/2) - exploring XML | WebReference

dtddoc step 2: Internal data structures (1/2) - exploring XML

dtddoc step 2: Internal data structures

After parsing XML DTDs with four different software packages in our last installment, we will look at the resulting data structures in Java, Perl, and PHP, respectively. This will be step two out of four in our creation of dtddoc:

  1. Parsing a DTD
  2. Creating the internal data structure
  3. Adding element and attribute descriptions
  4. Generating HTML

Creating the internal data structure

We used the following parsers:

As demonstrated before, these parsers have a straightforward way of calling them on an input stream, creating an internal data structure in return. Let's examine those internal data structures in turn.

Bourret's DTD parser

Using DTDParser's parseXMLDocument() or parseExternalSubset() functions returns a DTD object for further inspection. The DTD object holds hashtables of entities and element types of the parsed DTD. An ElementType has various members that carry interesting information:

attributesA Hashtable of attributes for the respective element type.
childrenA Hashtable of child ElementTypes.
contentA Group representing the content model of the element type.
contentTypeThe type of content, one of:
CONTENT_ANY: Any content type.
CONTENT_ELEMENT: Element content type.
CONTENT_EMPTY: Empty content type.
CONTENT_MIXED: "Mixed" content type.
CONTENT_PCDATA: PCDATA-only content type.
CONTENT_UNKNOWN: Unknown content type.
nameThe XMLName of the element type.
parentsA Hashtable of parent ElementTypes.

With this information, enumerating all element types and outputting the above information in HTML should be a snap. The content model's hierarchy can be easily navigated through the parents and children members of each element type.

on to inspecting the other tools' data structures...

Produced by Michael Claßen

URL: http://www.webreference.com/xml/column66/index.html
Created: Oct 14, 2002
Revised: Oct 14, 2002