HTML Unleashed. The Emergence of XML: The Premises of XML | WebReference

HTML Unleashed. The Emergence of XML: The Premises of XML


HTML Unleashed: The Emergence of XML

The Premises of XML


As with every new and promising technology, it is probably more important to explain what it isn't rather than what it is.  XML, just like SGML, is not a page layout or graphics language.  By itself, XML provides even fewer presentation tools than you have with HTML.  Strictly speaking, it's not even a markup language, but rather a system making it possible to build such languages to match any conceivable document type.

Chapter 3, "SGML and the HTML DTD," explains the HTML document type definition (DTD), the specification of HTML tags and document structure written in SGML.  Similarly, with XML you can build a DTD that exactly matches the structure of your document and introduces a set of self-explanatory, logically organized tags and attributes fine-tuned for your markup needs.

By attaching the DTD with the document sent over the network, you can ensure that the XML software reading the document can parse it correctly and thereby guarantee its correct formatting, conversion, adding to a database, or whatever the receiver will choose to do with the document.  In short, with XML you can create your own HTML, or XYZML, or Whatever-You-Like-ML!  (No surprise such a language was called "extensible" in the first place.)

It is important to understand that XML isn't better than HTML because it makes it easier to change fonts or position images.  In the visual presentation realm, XML is nothing better than HTML (some might say it's worse because it lacks all those neat Netscape enhancements---unless you've defined them in your DTD).  It was the intention of the creators of the language that the visual presentation of an XML document can be (optionally) specified by an attached style sheet, which is an external mechanism for XML just as it is for HTML.

XML's visualization power is thus completely determined by the style sheet language you use---for example, Cascading Style Sheets (CSS) or Document Style Semantics and Specification Language (DSSSL)---and if you don't care about logical markup you can achieve exactly the same visible results by using this chosen style sheet mechanism with HTML.  (Remember that you can use the neutral SPAN tag in HTML to apply any attributes, style names included, to arbitrary fragments of text.)  It is when the proper internal structure of your data really matters that XML easily outshines HTML.

The XML specification defines the language in the terms of behavior of a parser, which is a piece of software whose sole purpose is to understand the element structure of your document and break it down into nested elements in accordance with the DTD.  Another program (termed simply "application" in the XML spec) is supposed to obtain the document thus dissected from the parser and process it further.  Exactly what the application performs on the document is outside the scope of XML; for instance, it may be a browser that displays the document using an appropriate style sheet.

XML being a subset of SGML, an XML document is almost always a valid SGML document; there are small discrepancies between these two languages that are likely to be eliminated soon with the acceptance of certain amendments to SGML standard.  The relation between XML and HTML is more complex.  With the capability to define new tags, XML documents are not likely to count as valid HTML very often; on the other hand, an HTML file is relatively easy to make XML-conformant on one of the two levels of conformance (described later), depending on whether you provide a DTD for your document or not.

I don't attempt a real tutorial of XML in this chapter for two reasons.  First, one chapter's space is surely insufficient to cover even the basics of the language, and second, the language itself is so young and unstable that it is probably untimely to start teaching it in a serious fashion.  (A quote from the language specification: "Please be advised that the draft you are now reading is unusually volatile.") Instead, this chapter presents a couple of small examples that will help you to quickly grasp the "look and feel" of the language.

In a sense, XML is positioned somewhere in between SGML and HTML, with the intent of its creators being to combine the best features of these two languages.  However, XML is much closer to SGML than to HTML, and although knowledge of HTML will help you understand the most obvious XML features, an acquaintance with SGML syntax and ideology would be of much better help.  So I recommend that you brush up what you remember from reading Chapter 3 before proceeding to subsequent sections of this chapter.


Created: Jun. 15, 1997
Revised: Jun. 16, 1997