Previous | Table of Contents | Next |
This chapter covers some of the tools and techniques for managing SGML data on the World Wide Web. It is an overview of tools rather than a guide to using any particular tool. The following general information is covered first:
The chapter then moves on to more specific issues that you will encounter as you plan a Web SGML strategy:
Chapter 26, Tools for the PC: Authoring, Viewing and Utilities, and Chapter 27, Tools for the Mac: Authoring, Viewing, and Utilities, have more detailed information about selected SGML tools available on the Mac and PC.
The Web defines how to use text data in one SGML DTD, namely HTML. Because of this, most Web-specific tools support only HTML. This leads to a problem if you have (or want) generic SGML. You cant ship SGML files in any other DTD to typical browsers, such as Mosaic and Netscape Navigator. Or rather, you can ship files around, but HTML browsers wont know what to do with them when they get there.
When data moves around in the Web, it carries a MIME (Multi-purpose Internet Mail Extensions) type that says whether its a GIF file, a sound file, HTML, or something else. Browsers use this information to decide whether they can show the data by themselves, or through some external helper application. Suppose you click a link like this one, which points to a source for information about Internet Draft standards-in-progress:
<ahref="http://ds.internic.net/ds/dspg0intdoc.html">Internet Drafts</a>
The server knows this is an HTML file, and says so when it sends it back to the client that requested it, by labeling it with MIME type Text/HTML.
If a file is in some other SGML DTD, the server would have to say something else, not Text/HTML. A working group in the Internet Engineering Task Force (IETF) is finalizing the exact definition of MIME types for SGML. There will probably be two types: Text/SGML and Application/SGML. The main differences between them are:
The IETF is also working on a standard method for accessing DTDs, SGML declarations, stylesheets, and other related data given an SGML document file.
If you have software that can interpret SGML, you should be fine whether the server sends MIME type Application/SGML or Text/SGML; if not, Text/SGML tells your browser that its okay to bring the file up in a plain-text editor, such as emacs, MS Write, or SimpleText, and Application/SGML says that it isnt, in which case youll probably get a dialog box that says the file cant be displayed.
There is a third option. The server can lie and say a file is HTML even when its in some other DTD. If the SGML is very simple and shares its most commonly used tag names with HTML, the result is probably usable. Many DTDs use obvious names, such as P for paragraph, H1...Hn for headings, LI for list items, and I for italics or emphasis. If the SGML file uses common tags, a sufficient number of elements may turn out fine and make the file readable. Most HTML parsers are not picky. However, the other tags will be ignored, and that will make many SGML files look very poor. Depending on the particular browser, ignored may mean the non-HTML tags get discarded, they get discarded along with all their content, or they get displayed as if they were just text, or possibly something else.
Another problem is that lots of SGML constructs arent noticed at all by most HTML parsers. Marked sections are a good example. Most browsers will get the interpretation wrong if your documents contain anything like this (whether or not they conform to the HTML DTD!):
<p>By providing a common public vocabulary for text markup, texts as important and useful as they ought to be, <![ IGNORE [ and achieved lasting world peace. ]]> <![ INCLUDE [ but only one step.]]></p><author>C.M. Sperberg-McQueen</author>
Because most HTML parsers dont support marked sections, the display would include them literally and the reader would see confusing and misleading formatted text like this:
By providing a common public vocabulary for text markup, you will have taken one major step toward making electronic texts as important and useful as they ought to be, <![ IGNORE [ and achieved lasting world peace. ]]>
<![ INCLUDE [ but only one step.]]>
(C.M. Sperberg-McQueen)
Of course, what the reader should have seen is this:
By providing a common public vocabulary for text markup, you will have taken one major step toward making electronic texts as important and useful as they ought to be, but only one step.
(C.M. Sperberg-McQueen)
Trying to pass a document in another SGML DTD off as HTML causes problems. If all the circumstances are exactly right (a very big if), it might be workable. But its like trying to finish the last 100-mile leg of a car trip on a spare tire that says Good for 50 miles.
Fortunately, there are two much better, more realistic ways to use SGML on the Web despite the limitations of many browsers. First, you can use client-end software that can read generic SGML. Second, publishers can use special servers (or server add-ons) that accept full SGML, but translate it to HTML before sending it to clients so that even HTML-only clients can see it.
Previous | Table of Contents | Next |