Sams Teach Yourself XML in 24 Hours | WebReference

Sams Teach Yourself XML in 24 Hours

Sams Teach Yourself XML in 24 Hours

Excerpted from Sams Teach Yourself XML in 24 Hours, Complete Starter Kit, 3rd Edition by Michael Morrison. ISBN 067232797X, Copyright © 2005. Used with the permission of Que & Sams Publishing.


Addressing and Linking XML Documents

Writing for the Web without linking is like eating without digesting. It's literary bulimia.

—Doc Searls

So maybe that quote is a little strong in regard to the importance of linking on the Web, but the underlying point is still very much valid. Just as XML has been leveraged to improve other facets of the Web such as the core syntax and structure of HTML, so is it being used to improve upon the very linking mechanism that forms the interconnections between pages on the Web. I'm referring to XLink, which is the XML linking technology that allows you to carry out advanced linking between XML documents. Coupled with another important XML technology called XPointer, XLink builds on the premise of HTML hyperlinks but goes several steps further in supporting advanced linking features such as two-way links. Although XML linking is still a relatively new technology, it is already having an impact on how information is connected on the Web.

Linking XML documents goes hand in hand with addressing XML documents. Yet another XML-related technology called XPath makes it possible to specify exactly where XML content is located. Just as your mailing address helps you to remember where you live, XPath provides a means of remembering where nodes are located in XML documents. Okay, you probably don't rely on your mailing address to remember where you live, but you will rely on XPath if you use technologies such as XSLT, XLink, or XPointer, which must reference parts of XML documents. XPath is the enabling technology that allows you to drill down into XML documents and reference individual pieces of information.

In this hour, you'll learn

  • How to navigate through an XML document using XPath patterns

  • How to build powerful expressions using XPath patterns and functions

  • What technologies come together to support linking in XML

  • How to reference document fragments with XPointer

  • How to link XML documents with XLink

Understanding XPath

XPath is a technology that enables you to address parts of an XML document, such as a specific element or set of elements. XPath is implemented as a non-XML expression language, which makes it suitable for use in situations where XML markup isn't really applicable, such as within attribute values. As you know, attribute values are simple text and therefore can't contain additional XML markup. So, although XPath expressions are used within XML markup, they don't directly use tags and attributes themselves. This makes XPath considerably different from its XSL counterparts (XSLT and XSL-FO) in that it isn't implemented as an XML language. XPath's departure from XML syntax also makes it both flexible and compact, which are important benefits when you consider that XPath is typically used in constrained situations such as attribute values.

XPath is a very important XML technology in that it provides a flexible means of addressing XML document parts. Any time you need to reference a portion of an XML document, such as with XSLT, you ultimately must rely on XPath. The XPath language is not based upon XML, but it is somewhat familiar nonetheless because it relies on a path notation that is commonly used in computer file systems. In fact, the name XPath stems from the fact that the path notation used to address XML documents is similar to path names used in file systems to describe the locations of files. Not surprisingly, the syntax used by XPath is extremely concise because it is designed for use in URIs and XML attribute values.

Similar to other XML technologies, XPath operates under the notion that a document consists of a tree of nodes. XPath defines different types of nodes that are used to describe nodes that appear within a tree of XML content. There is always a single root node that serves as the root of an XPath tree, and that appears as the first node in the tree. Every element in a document has a corresponding element node that appears in the tree under the root node. Within an element node there are other types of nodes that correspond to the element's content. Element nodes may have a unique identifier associated with them that is used to reference the node with XPath. Figure 22.1 shows the relationship between different kinds of nodes in an XPath tree.

Nodes within an XML document can generally be broken down into element nodes, attribute nodes, and text nodes. Some nodes have names, in which case the name can consist of an optional namespace URI and a local name; a name that includes a namespace prefix is known as an expanded name. Following is an example of an expanded element name:

<xsl:value-of select=”.”/>

In this example, the local name is value-of and the namespace prefix is xsl. If you were to declare the XSL namespace as the default namespace for a document, you could get away with dropping the namespace prefix part of the expanded name, in which case the name becomes this:

<value-of select=”.”/>

If you declare more than one namespace in a document, you will have to use expanded names for at least some of the elements and attributes. It's generally a good idea to use them for all elements and attributes in this situation just to make the code clearer and eliminate the risk of name clashes.

Getting back to node types in XPath, following are the different types of nodes that can appear in an XPath tree:

  • Root node

  • Element nodes

  • Text nodes

  • Attribute nodes

  • Namespace nodes

  • Processing instruction nodes

  • Comment nodes

You should have a pretty good feel for these node types, considering that you've learned enough about XML and have dealt with each type of node throughout the book thus far. The root node in XPath serves the same role as it does in the structure of a document: it serves as the root of an XPath tree and appears as the first node in the tree. Every element in a document has a corresponding element node that appears in the tree under the root node. Within an element node appear all of the other types of nodes that correspond to the element's content. Element nodes may have a unique identifier associated with them, which is useful when referencing the node with XPath.

The point of all this naming and referencing of nodes is to provide a means of traversing an XML document to arrive at a given node. This traversal is accomplished using expressions, which you learned a little about back in Hour 13, "Access Your iTunes Music Library via XML." You use XPath to build expressions, which are typically used in the context of some other operation, such as a document transformation. Upon being processed and evaluated, XPath expressions result in a data object of one of the following types:

  • Node set—A collection of nodes

  • String—A text string

  • Boolean—A true/false value

  • Number—A floating-point number

Similar to a database query, the data object resulting from an XPath expression can then be used as the basis for some other process, such as an XSLT transformation. For example, you might create an XPath expression that results in a node set that is transformed by an XSLT template. On the other hand, you can also use XPath with XLink, where a node result of an expression could form the basis of a linked document.

Navigating a Document with XPath Patterns

XPath expressions are usually built out of patterns, which describe a branch of an XML tree. A pattern therefore is used to reference one or more hierarchical nodes in a document tree. Patterns can be constructed to perform relatively complex pattern matching tasks and ultimately form somewhat of a mini-query language that is used to query documents for specific nodes. Patterns can be used to isolate specific nodes or groups of nodes and can be specified as absolute or relative. An absolute pattern spells out the exact location of a node or node set, whereas a relative pattern identifies a node or node set relative to a certain context.

The next few sections examine the ways in which patterns are used to access nodes within XML documents. To better understand how patterns are used, it's worth seeing them in the context of a real XML document. Listing 22.1 contains the code for the familiar training log sample document that you saw earlier in the book, which serves as the sample document in this hour for XPath.

Created: March 27, 2003
Revised: December 12, 2005