Special Edition Using SGML:Should You Upgrade To SGML?

Fewer Browsers To Choose from

At this time, only a few networked information browsers can receive and format SGML regardless of the DTD. Most Web browsers have the HTML tag names built right into the program, and require a new release to add new ones. This is true even if the new ones don’t require any new formatting capabilities; adding a BOOK-TITLE element type won’t work, even though you may only want it to mean “show in italics.”

The main exception that is already released is a viewer called Panorama, developed by Synex and marketed by SoftQuad. Panorama is an add-on “helper” to existing browsers, like various graphics viewers. This means it does not talk to the network by itself; instead, when a Web browser follows a link and notices that the data coming back claims to be “SGML,” it can forward the data to Panorama for display.

If there are Internet-based links in the SGML, Panorama calls the browser back to retrieve them. If the destination is HTML or GIF, it shows up directly in the Web browser. If it’s SGML, the browser calls Panorama again.

Another SGML-capable Web browser was shown at the last Web conference and is being released right around the time this book appears. It’s a new version of the DynaText SGML delivery system that can view SGML or HTML off a hard disk or CD-ROM, across the Net, out of a database, or from a compiled/indexed form used for big documents. It provides a unified environment for viewing all these data types, as well as graphic and multimedia formats.

Although there aren’t many SGML-capable Web browsers, these two are very flexible and give you a lot of control over formatting, style, and other capabilities. Hopefully, more browsers will start to support generic SGML over time.

In the meantime, there are several server-end options available, too. You can always create and maintain documents using full SGML, and then run a conversion program to create HTML from it and put that on the Web. This is especially useful if you have an SGML-based authoring system in use for general publishing or other applications.

There are also Web servers available that can store SGML directly and then translate it to HTML on demand (for example, DynaWeb from Electronic Book Technologies—you can try it out at http://www.ebt.com). This method has the advantage that you can adjust the translation rules any time without re-running a big conversion process over all your data. It also means the translation can be customized as needed, for example, to adjust to whichever browser is calling in, or even to modify the document by inserting real-time information during translation.

A DTD To Choose or Design

Even if you have all the software you need, with full SGML you’ll need to answer a question that never arises with HTML: What DTD should I use? Very good DTDs are already available for a wide range of document types, and you can probably put off DTD-building for as long as you want by using them. This makes the task a lot easier. But even so, you have to think about your documents and then learn at least enough about a few DTDs to make a choice. You may also want to tweak an existing DTD—this is easier than starting from scratch, but still takes skills beyond those needed for tagging.

More Syntax To Learn

If you want to make up your own DTDs, you need to deal with all kinds of declarations, parameter entities, content models, and so on; there’s a lot of syntax to learn (tools like Near & Far help a lot). If you use an existing DTD, there is less syntax to worry about, but there’s still a little more than with HTML.

SGML provides many ways of saving keystrokes in markup, and many special-purpose constructs you never see used in HTML (as you learned about in chapters 3 and 16). Using these constructs in an HTML document will result in errors of one kind or another. For example, if you try to “comment out” a block of HTML with a marked section, its content is still there because typical HTML parsers don’t recognize marked sections. In fact, for them the characters <![ IGNORE [ and ]]> all count as text content!

    <P>
    <![ IGNORE [ This text is not part of the document, really.
       In fact, it’s <EMPH>really </EMPH> not there. ]]>
       And the paragraph goes on right here.
    </P>

In an HTML application that isn’t quite following the rules, this might be taken as just a paragraph that starts with some funny punctuation marks (a really bad HTML implementation might instead complain that you used a tag named ![). If you got used to this, you might be surprised when you go to a more generic SGML system and discover that the <![ in your document causes some very different effects—this is something you just have to memorize and know. In this case, the first two lines within the paragraph are not part of the content at all, and a browser shouldn’t show them to you.

Using a WYSIWYG SGML editor helps a lot, for the same reasons that using MS Word is a lot easier than typing Microsoft’s RTF interchange format directly. But even with the best tools, you can be surprised if you’re not aware of such restrictions (for example, you might get a “beep” whenever you try to type <![ in a paragraph, and not know why).

Benefits of Upgrading

If there’s less delivery software to choose from and more to learn, why bother? The reasons are mostly the same ones that influenced big publishers to go with SGML, although which reasons are most important varies from project to project.

Platform Independence

Other SGML DTDs are even better at abstracting formatting than HTML. SGML can be re-targeted to anything from a top-line photocomposition system down to text-only browsers like Lynx, Braille composers, and anything in between. SGML itself greatly benefits flexibility. HTML accomplishes this to some extent, but less so because a small and fixed tag set can force authors to think more about display effects and less about describing structure.

Table of Contents