Special Edition Using SGML:Developing for the World Wide Web

Table of Contents

Some users really need a dragster, not a road car. For them, the feature and option list is very short, but every one that remains is critical. A “dragster” version of SGML might optimize parsing speed, networkability, and other needs by using no minimization at all. It would also avoid some tricky constructs like #CONREF, #CURRENT, and declared content. Even drag racers, however, want to be able to choose their own color scheme, tread style, and the like. Likewise, even the most stripped-down SGML subset needs the capability to define new tags (essentially, this means keeping a DTD).

Batch versus On-Demand Models

Once you have SGML data, delivering it raises issues. There aren’t a lot of tools that can view SGML directly over the Net; DynaText and Panorama are currently the main tools, although that is changing. If you use them, the advantage is that all of the information expressible in SGML can move across the Net; that means the client can search, format, navigate, and filter the document in any way the original markup makes possible, rather than only in the ways HTML provides. The DynaText network client is to be released about the time this book reaches the shelves, but figure 22.1 gives a preview of it, viewing a manual natively in SGML across the Internet.

Fig. 22.1 The DynaText Web client can view and navigate SGML locally or across the Net, regardless of what DTD is used (even HTML).

The alternative is to use an HTML-only client, but then something is lost in transmission. For example, you can’t search or edit as flexibly if all you get is the HTML form. So you have a choice to make with your Web data:

• Use Panorama or DynaText

• Wait for other tools to become available

• Batch-convert your data to HTML

• Use a server that can convert data on-demand

When it’s possible, the last choice gives you the most flexibility because it lets you provide the data in the original or any derived form (or forms), any time you want. Ideally, such a server should not only support SGML-to-HTML conversions, but similar features for graphics. It should be able to send hi-res color images to people who have the bandwidth, hardware, and desire, but low-res versions of the same images to people who want to save time or have cheaper displays. This flexibility is also important to the visually impaired; such a server can send them text descriptions of graphics, which readers’ terminals then read to them or display in Braille.

Some SGML Web sites let you choose the form you want—they have documents in many representations besides HTML, and can convert and serve them up in the best form for each individual client calling in. This is a lot like sites that let you choose between English, Spanish, French, and German when you enter the site. It gives a friendly feel, like the Web publisher is going out of his way to accommodate users.

Note:
Here are some ways a server can customize for you. It’s easy to see how each would be important for some users:

•  Pick out certain elements for certain users, such as <P LANGUAGE=GERMAN> for a German user.
•  Suppress graphics and send their accompanying IMAGE-DESCRIPTION elements instead, when sending to blind users, users with non-graphical browsers, or users who just want to save bandwidth.
•  Shorten the text by leaving out parts with certain PRIORITY attributes. For example, you could tag documents to respond cleverly depending on the reader’s bandwidth, interest level, or expertise level.
•  Check what client program is calling in, and optimize the formatting for it; use FRAME for Netscape, use Microsoft extensions for Explorer, full SGML for clients that support it, and so on.

If you’re surfing the Web and you see documents optimized for your connection speed, your particular Web client, and version, the first thing you’ll be asking is how it’s done, and the second is how you can do it, too.

SGML makes it easy. You still have to set things up, say by noticing that Netscape has a format-oriented <BLINK> or <FRAME> tag that others don’t, so your converter can put that tag in for Netscape and put in something else for others. But you only have to do that once, and you don’t have to manage 20 slightly different versions of every page you have.

MIME Issues

The Internet standards groups are just now finalizing a MIME type for SGML (agreed-upon types are already there for HTML [TEXT/HTML], GIF [IMAGE/GIF], and so on). Until that’s done, programs set up to receive SGML over the Net can’t count on servers labeling the SGML documents with a certain known MIME type. In the meantime, systems tend to use “X-SGML” as their MIME type; the “X-” is reserved for trying a new MIME type out before it is standardized.

This is not a big problem. There are only so many different kinds of servers out there, so there won’t be 20 different MIME types showing up anyway. But this is worth keeping an eye on, so as soon as a standard is final, you can look for servers that make use of it. Information about MIME standards can be found (among other places) at http://ds.internic.net/rfc1521.txt.

When the MIME standard arrives, some servers may need a slight tweak in order to label SGML correctly for MIME when they’re sending it. The other case where servers need special attention is if you want them to be able to serve the same source SGML data in a variety of custom forms, such as HTML version X.Y, or various SGML DTDs.

High-End Search/Retrieval

You’ve probably used searching services on the Web. Some only index URLs, so you have to guess at least part of the file name to find something. Some index just the beginning of HTML files, extracting the content from a few special elements like TITLE. Some index just the titles, and a few index the entire text content.

Table of Contents