XML: A Primer:The XML Linking Specification

Locators and Chunks

Even though XML can use the simple fragment identifier notation of HTML, it is capable of far more interesting things. Many of the tasks it is capable of performing are quite useful even in simple links, allowing authors and developers to treat elements, rather than documents, as the primary unit involved in linking. XML’s locator syntax is considerably more robust than that of HTML, providing a number of tools that can address parts of documents by structure, ID, HTML anchor, or even text content.

HTML allowed fragment identifiers to follow a URL using the following syntax for the HREF attribute:

  HREF="url#fragmentidentifier"

XML allows a similar syntax but uses its more developed XPointer in place of HTML’s fragment identifier:

  HREF="url#XPointer"

  HREF="url|XPointer"

In the first case, using #, the location referenced by the URL is to be fetched as a whole document by the processor (replacing the current document, if the SHOW attribute is set to “REPLACE”), and then the location referenced by the XPointer is to be located by the client. In the second, using the | connector, according to the XML-LINK standard, “no intent is signaled as to what processing model is to be used to go about accessing the designated resource.” This provides a means for the server to handle the XPointer processing, cutting down on the bandwidth needed for transmission because the server can return only as much of the document as is needed.

XPointers: An Introduction

The XPointer (an abbreviation for Extended Pointer) is derived from the TEI standards described above in Chapter 4. XPointers designate resources using location terms. An XPointer may contain either one or two locators; if it contains two, they are separated by two periods (..). If the XPointer contains two locators, the XPointer refers to all content between the start of the element identified by the first locator and the end of the element identified by the second locator. We’ll build some locators first and then see how XPointers work.

Locators may contain absolute, relative, or string-matching location terms. String-matching terms are the most limited, but they provide a degree of precision the other terms can’t match. Relative terms allow links to refer to document content by its position within the element tree of a document or even by its content. Absolute location terms identify elements using the more conventional ID and NAME addressing schemes, as well as some other basic locations.

The default absolute keyword, which will rarely if ever be used, is ROOT( ). This specifies the root element of the document, the outermost element of the document tree. ROOT( ) effectively refers to the entire content of the document. The HERE( ) keyword refers to the linking element itself and is frequently used to provide an absolute position for subsequent relative terms. This allows an XPointer to specify content “2 paragraphs below the link element,” for example. The DITTO( ) keyword may be used only as the second term of a pair, and simply duplicates the terms of the first term.

The empty parentheses following the ROOT( ), HERE( ), and DITTO( ) keywords are required.

The next two keywords will produce somewhat more familiar results. The HTML(namevalue) keyword takes a value that matches an HTML A element’s NAME attribute, providing exactly the same service for fragment identifiers as was available in HTML. This will mostly get used in XML documents that refer to HTML documents, of course, because XML documents should use the ID attribute instead. The ID(name) keyword provides similar but improved functionality. Every element in a document may have an ID value; this means that any element can be quickly referenced this way. In fact, the XML-Linking documents recommend the use of this mode. By default, as noted previously, all fragment identifiers that aren’t otherwise marked are treated this way. The fragment identifier “#fragmentidentifier” will be treated as ID(fragmentidentifier) automatically.

Table of Contents