The Document Object Model (DOM), Part III: Handling an Illegal HTML | WebReference

The Document Object Model (DOM), Part III: Handling an Illegal HTML

The Document Object Model (DOM), Part III (5)

Handling an Illegal HTML

Internet Explorer corrects automatically illegal HTML pages. An example for an illegal HTML construct is a list tag placed within a paragraph tag. In a valid HTML page, all lists should be positioned outside the page's paragraphs. Speaking in terms of the DOM's tree, it is illegal for a list node to be a grandchild of the body node. A list node should be a child of the body node. We chose to demonstrate an illegal HTML page with a specific list tag, the <UL> tag. Of course, the problem occurs for all types of lists: unordered (<UL>), ordered (<OL>), or descriptive (<DL>).

Let's take a simple example:

<BODY ID="bodyNode">
<P ID = "p1Node">This is paragraph 1.
     <UL ID = "listNode">
          <LI ID = "bullet1Node">This is bullet 1
          <LI ID = "bullet2Node">This is bullet 2
          <LI ID = "bullet3Node">This is bullet 3
<P ID = "p2Node">This is paragraph 2.

Try to analyze this document example as is and sketch its DOM structure. Compare now with our own suggestion. Let's see now how Internet Explorer models this document in its DOM. The following script prints the top level children of the bodyNode node. Here are they:

bodyNode.firstChild.nodeName = P
bodyNode.childNodes[1].nodeName = UL
bodyNode.childNodes[2].nodeName = P
bodyNode.childNodes[3].nodeName = P

As you can see, the <UL> list tag is a child of the top level bodyNode tag node. This is not as it is appears in the original HTML page. The list is inside a paragraph and apparently should be a child of that <P> tag node. More than that, as you can see from the sketch of the corrected page implementation, the DOM presents three <P> tag nodes, while the document example above includes just two paragraphs. The browser corrects the illegal HTML page by splitting the culprit paragraph into two paragraphs: the first paragraph includes that portion of the original paragraph that precedes the list, while the second paragraph includes that portion of the paragraph that follows the list. In summary, we get an additional paragraph and a change in the list's position from a grandchild to a child of the bodyNode.

Produced by Yehuda Shiran and Tomer Shiran

Created: June 21, 1999
Revised: June 21, 1999