Previous Table of Contents Next


CHAPTER 10
The XML Linking Specification

HTML’s explosive growth probably had more to do with its linking than with any other single factor. Hundreds of people and organizations were working at the same time on hypertext systems, some even using SGML, but none of them had the simplicity of HTML’s convenient linking system. <A HREF=”url“> has strung millions of pages together and built the World Wide Web. Still, there’s definitely room for improvement. Hypertext specialists and other developers complained of the limited abilities of these basic links, and at least some HTML developers looked at SGML’s more complete HyTime specification and wished for some of its power. XML is a chance to do things better, and the XML working group has focused on linking early. XML linking builds on HTML’s success and provides more powerful, yet more complex tools.

Simple Links

After six years of extensive use, HTML’s linking systems are under mounting criticism for providing only the most basic of links. Many developers are perfectly content with the current linking syntax, and a small army of development tools and Web-mapping tools have grown up around this key standard. HTML’s HREF attribute has done well enough for most developers. Why break it? XML doesn’t break the previous standard, it just adds to it. It adds a lot in fact, but the basic HTML link structure is still preserved, and certain aspects of it are even grandfathered into the XML standard to make it easier for XML documents to link to HTML. We’ll start by examining the kinds of links available to HTML documents, and then we’ll look at how we can implement them in XML.

Links in HTML

The A element is the key to nearly all HTML linking, although the LINK element plays a limited role. A simple HTML link created with an A element is shown in Figure 10.1.


Figure 10.1  Simple unidirectional in-line link, HTML.

The A element has several attributes, only one of which gets constant use from HTML developers: the all-powerful HREF. HREF always takes a URL for its value, which represents the target of the link. URLs may be absolute or relative. Absolute URLs begin with a scheme, which describes the protocol that should be used to interpret the URL. Commonly used schemes include http:, ftp:, gopher:, mailto:, nntp:, file:, and javascript:. The information applicable to the scheme follows the scheme. In most cases, this will be a reference to a server or a file on a server, prefixed with two slashes. For example, an absolute URL using the HTTP protocol uses the following syntax:

  http://hostcomputer:port/path?query

The port and query are optional. The hostcomputer must be a valid DNS name or IP address, and the port optionally specifies a port on the hostcomputer (80 is the default for http). Path specifies a path to a particular file on the host computer; only numbers, letters, and $, -, _, +, !, *, ‘, (, ), and the period and comma are allowed in the path. Other characters may be escaped by the % sign, followed by their hexadecimal value. The optional query provides additional information to the server, allowing it to respond appropriately to form or other information. The content of query is limited to the same characters as path.

For the javascript: scheme, the value after the colon can be any valid javascript code. The javascript: scheme is on the way out—HTML 4.0 provides the onclick attribute to allow elements to activate scripts without placing javascript in the HREF attribute.

Relative URLs use the URL of the current page (or, if it exists, the URL set by the BASE element in the HEAD element of the page) as a prefix to their information. Relative URLs do not include a scheme.

The complete official syntax for URLs is defined in RFC 1738 (http://www.w3.org/Addressing/rfc1738.txt) and RFC 1808 (http://www.w3.org/Addressing/rfc1808.txt).

Both absolute and relative URLs may include a fragment identifier at the end in place of the query. Fragment identifiers in HTML include a pound sign (#) and a value that should connect to the NAME attribute of an A element in the target document. For example, in

  <A HREF="#laterlink">Skipping around is fun!</A>
  ...
  ...
  <A NAME="laterlink">Aren't you glad you skipped ahead?

clicking on “Skipping around is fun!” would scroll the document to the location of the A element with the NAME attribute “laterlink”. Of course, fragment identifiers can also be used in combination with URLs.

  <A HREF="zip.html#nothingness">

would take a user who clicked on it to the line in the zip.html file that contained

  <A NAME="nothingness">

The A element also allows for REV and REL tags, which are intended to show the relationship between the anchor and the target URL. REL indicates the relationship of the URL to the anchor (moving forward along the link), whereas REV indicates the relationship of the anchor to the URL (moving backward along the link). Neither of these is widely used in anchor tags. The LINK element, which appears in the HEAD element of HTML documents, also supports the HREF, REL, and REV attributes. In the case of LINK (as we saw in Chapter 2), REL does get used to indicate that the target URL is a style sheet. LINK also provides a TYPE attribute to indicate the MIME type of the target URL. Unlike an A element, the LINK element doesn’t take the user anywhere—it just connects outside files to the document, in much the same way that an SRC attribute does for an IMG, APPLET, SCRIPT, or OBJECT.


Previous Table of Contents Next