Special Edition Using SGML:Should You Upgrade To SGML?

Browser Independence

Because generic SGML software (by definition) handles many DTDs, using a new or modified DTD won’t faze it. If it works for CALS and TEI, it’ll almost certainly work for whatever DTD you choose.

SGML vendors spend a lot of time testing interoperability. A standard demo at trade shows used to be to pass a tape or disk of SGML files from booth to booth throughout the show. Each product had to read the data, do whatever it did with the data (like let you edit or format it), and then write it out to pass on—without trashing it.

The “SGML Open” vendor group gets together regularly online, at shows, and at special meetings to work out agreements on details and make sure SGML documents can move around easily. For example, a popular DTD for tables has a “rotate” attribute to let you lay out tables in either portrait or landscape mode, but doesn’t say whether rotation is clockwise or counterclockwise. The vendors sat down and decided, so now they all do it the same way. Simple agreements like this can save a lot of pain for end users.

Note:
The central point for finding out about SGML Open activities is http://www.sgmlopen.org. Most companies that support SGML are involved in SGML Open, and you can find links to their home pages from the SGML Open Web site, along with links to much other useful SGML information.

If you use an SGML-aware server, you can benefit from greater browser independence—even on the Web. Each Web browser has its own strengths and weaknesses. If you can ship slightly different HTML to each one, you can capitalize on the strengths and avoid the weaknesses. This is easier if your data uses a more precise DTD; clients tell servers who they are, so a server that has enough information can down-translate appropriately for each one.

HTML Revision Independence

Keeping your data in SGML also lets you avoid re-coding it each time a new HTML feature arrives. You learned earlier about tables—how you’d have to completely re-work them if you started by assuming the browser can’t support table markup, and then had to change your data when browsers caught up. The same problem came up when Netscape introduced their FRAME element and a lot of re-authoring had to happen. The same problems can happen with any kind of markup. By keeping your documents in DTDs designed to fit, you can leave them untouched and merely adjust a conversion filter.

Appropriate Tag Usage

The biggest fundamental benefit of going to SGML is that your markup can tell the truth about what components are in your document, even if the document doesn’t fit into any pre-existing scheme. If the tags you need are there (or, at worst, you can add them yourself), you avoid having to “pun” and use a single tag for a bunch of purposes it may not have been meant for.

Note:
The question of having the right tag available for the job is very important, so here are a few examples. We’ve already talked about how sixth-level HTML headings (H6) get used to mean small caps, and how italics (I) get used to mark many things like emphasis, foreign words, book titles, and so on.
Sometimes preformatted text (PRE) gets used for quick-and-dirty tables. Line-break (BR) gets used heavily for forcing particular browsers to lay things out a certain way (and usually that way only works well for certain browsers, certain window widths, and so on).

Another big example is equations; since there are not yet HTML elements for doing math, journal publishers and others are stuck turning equations into graphics for Web delivery. This sort of works, but the fine print tends to disappear, and zooming in doesn’t help. This is a case where there’s dire need for a more adequate set of tags. And there are already some very good equation DTDs in wide use outside the Web.

Large Document Management

SGML helps you manage the conflict between big documents and slow modems. You can’t very well ship a whole manual or a lengthy paper of any kind every time a user wants to see the nth paragraph (even if browsers could handle documents that big, which many can’t)—no user would wait for the download to finish. Novell certainly couldn’t ship tens of thousand of pages of NetWare manuals every time a user wanted a summary of some installation detail.

The only viable option with documents bigger than several tens to hundreds of pages is to break them up; you can make many smaller documents, say one for each subsection, and a bunch of overview documents that give you access similar to the table of contents in a paper book. This is usually done manually for HTML because HTML documents don’t usually contain explicit markup for their larger components (some do now that HTML has added the DIV element). This method works except for these problems:

• If you are also publishing a paper document, you have to maintain two quite different forms.

• The document ends up in many pieces that aren’t visibly related; only a person can tell whether some link between HTML files A and B means they’re part of the same document, or two somehow-related documents. This makes it hard to maintain consistency between all the parts of your original document.

• If users want to download the whole document for some reason, it’s very hard to do. First, they have to find all the pieces, distinguishing “is-part-of” links from “is-related-to” links; then they have to assemble all the parts in the right order and put the larger containing structures in. It’s not enough to just pack them end-to-end because some of the connections between lower sections appear only in “header” or “table of contents” documents.

• Users can’t scroll smoothly through the complete document; at best, you can carefully provide Next Portion and Previous Portion buttons on every piece.

Table of Contents