Previous Table of Contents Next


Chapter 20
Practicalities of Working with SGML on the Web

This chapter covers some of the tools and techniques for managing SGML data on the World Wide Web. It is an overview of tools rather than a guide to using any particular tool. The following general information is covered first:

  Tactics for using SGML on the Web
  Tools

The chapter then moves on to more specific issues that you will encounter as you plan a Web SGML strategy:

  Planning
  Consistency and flexibility
  Allowances for HTML
  The danger of overusing optional SGML features

Chapter 26, “Tools for the PC: Authoring, Viewing and Utilities,” and Chapter 27, “Tools for the Mac: Authoring, Viewing, and Utilities,” have more detailed information about selected SGML tools available on the Mac and PC.

Tactics for Using SGML on the Web

The Web defines how to use text data in one SGML DTD, namely HTML. Because of this, most Web-specific tools support only HTML. This leads to a problem if you have (or want) generic SGML. You can’t ship SGML files in any other DTD to typical browsers, such as Mosaic and Netscape Navigator. Or rather, you can ship files around, but HTML browsers won’t know what to do with them when they get there.

When data moves around in the Web, it carries a MIME (Multi-purpose Internet Mail Extensions) type that says whether it’s a GIF file, a sound file, HTML, or something else. Browsers use this information to decide whether they can show the data by themselves, or through some external helper application. Suppose you click a link like this one, which points to a source for information about Internet Draft standards-in-progress:

    <ahref="http://ds.internic.net/ds/dspg0intdoc.html">Internet Drafts</a>

The server knows this is an HTML file, and says so when it sends it back to the client that requested it, by labeling it with MIME type “Text/HTML.”

If a file is in some other SGML DTD, the server would have to say something else, not “Text/HTML.” A working group in the Internet Engineering Task Force (IETF) is finalizing the exact definition of MIME types for SGML. There will probably be two types: “Text/SGML” and “Application/SGML.” The main differences between them are:

  Systems are free to convert line-ends between Mac/DOS/UNIX conventions in “Text/” files, but not in “Application/” files.
  Any “Text/” file should be human-readable—which is very subjective. For HTML and SGML, most files are human-readable, but ones with really dense tagging may not be.

The IETF is also working on a standard method for accessing DTDs, SGML declarations, stylesheets, and other related data given an SGML document file.

If you have software that can interpret SGML, you should be fine whether the server sends MIME type Application/SGML or Text/SGML; if not, “Text/SGML” tells your browser that it’s okay to bring the file up in a plain-text editor, such as emacs, MS Write, or SimpleText, and “Application/SGML” says that it isn’t, in which case you’ll probably get a dialog box that says the file can’t be displayed.

There is a third option. The server can lie and say a file is HTML even when it’s in some other DTD. If the SGML is very simple and shares its most commonly used tag names with HTML, the result is probably usable. Many DTDs use obvious names, such as P for paragraph, H1...Hn for headings, LI for list items, and I for italics or emphasis. If the SGML file uses common tags, a sufficient number of elements may turn out fine and make the file readable. Most HTML parsers are not picky. However, the other tags will be ignored, and that will make many SGML files look very poor. Depending on the particular browser, “ignored” may mean the non-HTML tags get discarded, they get discarded along with all their content, or they get displayed as if they were just text, or possibly something else.

Another problem is that lots of SGML constructs aren’t noticed at all by most HTML parsers. Marked sections are a good example. Most browsers will get the interpretation wrong if your documents contain anything like this (whether or not they conform to the HTML DTD!):

    <p>By providing a common public vocabulary for text markup, texts
    as important and useful as they ought to be, <![ IGNORE
    [ and achieved lasting world peace. ]]> <![ INCLUDE [ but only
    one step.]]></p><author>C.M. Sperberg-McQueen</author>

Because most HTML parsers don’t support marked sections, the display would include them literally and the reader would see confusing and misleading formatted text like this:

By providing a common public vocabulary for text markup, you will have taken one major step toward making electronic texts as important and useful as they ought to be, <![ IGNORE [ and achieved lasting world peace. ]]>

<![ INCLUDE [ but only one step.]]>

(C.M. Sperberg-McQueen)

Of course, what the reader should have seen is this:

By providing a common public vocabulary for text markup, you will have taken one major step toward making electronic texts as important and useful as they ought to be, but only one step.

(C.M. Sperberg-McQueen)

Trying to pass a document in another SGML DTD off as HTML causes problems. If all the circumstances are exactly right (a very big “if”), it might be workable. But it’s like trying to finish the last 100-mile leg of a car trip on a spare tire that says “Good for 50 miles.”

Fortunately, there are two much better, more realistic ways to use SGML on the Web despite the limitations of many browsers. First, you can use client-end software that can read generic SGML. Second, publishers can use special servers (or server add-ons) that accept full SGML, but translate it to HTML before sending it to clients so that even HTML-only clients can see it.


Previous Table of Contents Next