XML and Perl: Embedding XML in HTML | 5 | WebReference

XML and Perl: Embedding XML in HTML | 5

XML and Perl::Practical XML

The potential benefits of using XML as a generic self-descriptive data format are so compelling that many developers are already integrating the technology into their products. We will probably not see XML documents replacing HTML in the next year or so, but we will see it used extensively to automate Web content management processes. These processes include building Web indexes, resource catalogs, and enhancing searching capabilities. The scope of implementation may involve a single document, a logical collection of multiple documents, or an entire Web site.

This will initially be done by embedding XML metadata tags in HTML documents. Metadata is data about data, or a description/classification of information. For instance, examine the following:

<head><title>DMV Survival Guide</title></head>
<document type="faq"/>
<H1><toc section="title">DMV Survival Guide</toc></H1>
<p><abstract> Going to the DMV can be a painful experience, but this handy survival guide can make the experience tolerable.</abstract></p>
<H2><toc section="1.1"> Timing</toc></H2>
<p>When planning a trip to the DMV, multiply the maximum time you think your transaction could take by 3, then add 1 hour to that.</p>
<H2><toc section="1.2"> Entertainment</toc></H2>
<p>The DMV provides regularly scheduled malfunctions for enjoyment such as computer crashes, broken environemental controls, loss of power, and fire drills.</p>
<H2><toc section="1.3"> Identification</toc></H2>
<p>To prove your identity, be sure to bring two forms of valid id. This could be a combination of the following: drivers license, social security card, birth certificate. If you do not have a drivers license or have lost your license, be prepared to provide a verified and notarized DNA test to prove you are yourself.</p>

Notice the three non-HTML tags highlighted in blue above: <document>, <toc> and <abstract>. With this additional information, we can write a script to automate the process of building a table of contents for our documents. This capability may seem trivial for a few documents, but imagine building a table of contents by hand for hundreds or thousands of documents that are updated on a regular basis. We can also utilize this information to refine the granularity of results from our Web search engine, enhance the site index, and automate the organization of a categorized resource catalog using the <document> and <abstract> tags.

One notable XML format, RDF (Resource Description Framework), embodies the metadata ideology. In fact, Netscape has an RDF implementation in their Mozilla source. Mozilla is Netscape's effort to improve their browser by releasing the source code to the public for review and enhancement.

It should be noted that adding additional tags that are not defined in the HTML specification invalidates the HTML source. While most Web browsers ignore tags they don't recognize, problems might occur in some older browsers. To avoid problems, you may want to surround these tags with HTML comments.


Produced by Jonathan Eisenzopf and
Created: Feb. 14, 1999
Revised: Feb. 17, 1999

URL: http://www.webreference.com/perl/tutorial1/