Special Edition Using SGML:Developing for the World Wide Web

Table of Contents

A problem with most search tools is that they don’t index the content in relation to the tagging. So even though you can search for “John” and “Smith,” you can’t insist that they show up in the same paragraph. You can’t search for “Save” only where it’s mentioned in a FORM element, or “Picture” only within an A (link) element. You especially can’t search for “Save” where it’s in the documentation of menu items for a software package (since HTML has no tag for that).

SGML-aware servers usually can do all these types of searches, and enable you to be much more precise in your queries. Even if your client is getting only HTML, it should still be possible for the server to search the original SGML for you. If it can’t, you might want to look for a better server rather than changing your data.

Note:
Precise searching is really a technical term for information retrieval nerds; it means that you don’t get a lot of irrelevant information in addition to what you want. Watch out for the opposite problem too, even though it doesn’t happen as much with most Web search tools; if you don’t get enough of the information you do want, that’s called a recall problem.

Link Persistence

URLs are notorious for failing when people get a new computer, a new disk, or even a new job. This is because the data moves. SGML’s support for Formal Public Identifiers (FPIs) helps you get away from this. FPIs are expected to be generic, permanent identifiers for data, instead of today’s location of the data.

Public Identifiers can sit around until needed, and only then get converted into a physical location. That way if the data’s location changes, you can fix the problem by updating one table instead of every link everywhere in the world. The Internet standards groups are working on “URNs,” which are a lot like FPIs and help in the same way.

Note:
FPIs and URNs work a lot like a cellular phone. Behind the scenes a lot has to happen: as you move around and get “out of range” for one cell, you get switched to the next one. When that happens, you don’t have to give everybody a new phone number to reach you—you don’t even notice that anything happened at all. That’s because your cell phone isn’t identified by location (like a URL) but by a special, unique serial number that always stays the same for your individual phone, wherever you carry it (this is a lot like an FPI or URN).
When someone calls you, the phone company looks up the serial number, and then uses it to look up your current location in a table somewhere and send the call to where you really are. When your phone switches to a new “cell,” because the old one is too far away, your phone sends a little message to the new cell saying “I’m here.” The new cell tells the phone company to update the table. Files on the Web will be able to move around just as freely once they’re identified by names instead of locations.

In the meantime, you can make your links a little safer in two ways. First, use HTML’s new BASE feature so that all your URLs are relative to a place you specify up front—then you only have to change that one place when something happens. Second, use SGML linking capabilities and leave it to your server to translate more generic pointers, such as FPIs, into URLs when it sends out your data. The next section goes into a lot more detail about some of the linking capabilities that have been built on top of SGML using the HyTime standard.

HyTime: SGML and Hypermedia

SGML provides a lot of tools for representing different kinds of documents, with the most important one being the ability to make up new tag-sets whenever you need to. But it only provides limited capabilities for creating links—SGML doesn’t provide a standard way to mark up links between separate documents, for example.

ISO noticed this, and recently put out a standard that specifically extends SGML to deal with hypertext and multimedia. It’s called HyTime.

Note:
HyTime is built on top of SGML. After reading this book, you’ll be ready to find out about HyTime. You can learn about it in Steve DeRose’s and David Durand’s book called Making Hypermedia Work. Your bookstore may have it already, or they can order it from the publisher.

HyTime specifies ways of using SGML to represent the things needed for hypermedia. These include references to documents, graphics, video, sound, and other media, as well as particular places in them; links to connect pairs or groups of such references; and ways to schedule presentations out of referenced pieces. Obviously, this is a whole lot more than <A>. HyTime, therefore, is a pretty big standard. The good news is that, like SGML, you can do a great deal even if you only learn a little bit of HyTime—you can always learn more as you need it.

HyTime support can be added on top of any system that supports SGML. Like SGML, HyTime has some very complex features, but you can accomplish a lot even by using only a few of the most basic features. Several SGML products have already added support for those more basic features already, and more are on the way.

Ultra-Basics of HyTime Links

HyTime links have three important parts:

• An anchor is the data that actually is at some end of the link. This can be anywhere, and you don’t have to mark it in any special way for HyTime. Sometimes an anchor is loosely called an “end,” since it’s where you get to when you follow a link.

• A location address points to an anchor. There are many types of these: some refer to names, some let you count things, some let you do searches. You can even combine these, such as to ask for the third subelement of the element with ID=chap4 in such-and-such a document, or the upper-left quarter of the picture from minutes 2 through 12 of a video.

• A link puts together a bunch of location addresses to say they’re related. Typically this ends up meaning that users are able to get from one to another with a single mouse click, but any other relationship is fine, too.

The A element in HTML is like a link with the location address included (on the HREF attribute). Whatever the HREF points to is one anchor; the other anchor is the contents of the A element.

Table of Contents