Special Edition Using SGML:Should You Upgrade To SGML?

How Complex Are my References and Links?

HTML has great strength as a linking system. This is mostly because URLs can point to any data in any format, and browsers provide a very convenient way to get any of that data. URLs (the most commonly available way of identifying information on the Net, though more advanced ways are coming) can get data via all these protocols (Web-speak for “methods”) and others:


Protocol	Description

ftp	The data is copied down to your local machine.
http	The data is formatted and shown in the browser itself (or by a helper application for graphics, sound, video, and so on).
e-mail	Communication works like electronic mail.
news	Postings from network newsgroups are retrieved and presented.

HTML does all of this with only a few tags, mainly <A> and <IMG>. This means that the linking itself is not very complex or sophisticated, even though the data that the links point to is. For example, both <A> and <IMG> are one-way links ; they live somewhere in document A and point to document B, as shown in figure 19.4. But if you’re in document B, you don’t know that document A exists or that it points to you.

Fig. 19.4 The HTML <A> tag makes one-way links.

If you click a link and travel from document A to document B, most browsers will remember where you were and provide you with a Back button to return to the same document (though perhaps not to the same place in that document). That’s an important feature, but not at all the same as also being able to get from document B to document A in the first place—with true two-way links you know while in document B that there’s a link from document A.

Note:
It’s also hard with HTML links to go from document A to a specific place inside document B because URLs normally point to whole files. HTML does give rudimentary support for getting a whole file and then scrolling it to some element with a given “name” (like an SGML ID). This is useful, but doesn’t help much with larger SGML documents. With large documents, the problem of having to wait for the whole thing to download (even though you only need a small portion of it) becomes very important.
Link precision will probably improve in the future with conventions for a URL to give not only a file, but an ID or other location within a file, and to use this information to optimize downloading, not just scrolling. In fact, some servers already let you add a suffix to a URL to pick out a certain portion. For example, a server could let you put an SGML ID on as if it were a query, and then just serve up the element with that ID (including all its subelements, of course):

<a href="http://xyz.org/docs/book.sgm?id=chap4">

Though you can simulate a bi-directional (or two-way) link in HTML, you have to do it by creating two links (one in document A and one in document B). This poses a couple of problems; the most important one is that you have to actually go in and change both document A and document B, so you can’t just do this between any two documents you choose. Even if you can get at both documents to insert the links in the first place, it’s easy to forget to update one “half” of the link when you update the other. Such links gradually tend to break.

What do other hypermedia systems do about this? The best ones, SGML-based or not, provide a way to create links that live completely outside of documents, in a special area called a web (that name may change now that it’s popular as a shorthand for the World Wide Web). In that case, the picture looks more like figure 19.5. Many systems provide both methods, not just one or the other.

Fig. 19.5 An external web lets you create two-way links.

This is a much more powerful system, and you can do it with a number of SGML linking methods, such as HyTime and the TEI guidelines, and some recent systems like Hyper-G. It seems to have originated with the Brown University InterMedia system. Doing links this way has these benefits:

• Because links live outside the documents, anyone can create them without needing permission to change the documents themselves. You can even link in and out of documents on CD-ROM or other unchangeable media. This is especially important for big data like video, because it’s still much more effective to keep local copies on CD-ROM or similar media than to download huge files every time they need to be viewed.

• Because documents aren’t touched every time a link is attached, they can’t be accidentally trashed. Most HTML links have this advantage at one end since the destination document needn’t be touched. But the only way for HTML to point to a particular place inside a destination document is via an ID; so to do that you may have add one, and in that case HTML loses even this one-ended advantage.

• Because a set of links is a separate thing, you can collect links into useful groups and ship them, turn them off or on, etc. Siskel’s and Ebert’s links to movie-makers’ home pages can be in two separate webs, so you can choose to see either or both.

If you don’t need this more sophisticated linking, HTML’s links may be just fine. Otherwise, you need to go beyond HTML and beyond what current HTML browsers can do. The good news is that such a web can still use URLs and related methods to do the actual references, so you can keep the power HTML gets from them. You can add URL support (or even the <A> and <IMG> tags themselves) to another DTD that packages them up to provide greater capabilities.

Note:
TEI and HyTime links provide a very good way to express this kind of linking. We talk about them quite a bit more in Chapter 21.

Table of Contents