Previous | Table of Contents | Next |
Another question is what to do with the many optional features in the SGML standard. Many of these have to do with minimizing tagssaving you from typing and making raw SGML easier to read (in case you ever have to!). But SGML minimization provides synonyms for things SGML can already express, so you can always do without it. Unlike synonyms in human languages, which always have subtle differences, a minimized SGML tag expresses exactly the same element structure as if it were unminimized, so using it doesnt add any subtle special meanings to your documents. It may make typing or reading the tag easier, but who wants to literally type tags anyway? Better interfaces are available. For instance, you can pick element types from menus and display them as icons, in the margins, or on request.
There are some good reasons to avoid using much minimization in your SGML data (perhaps this should be called minimizing minimization?). This is especially true on the Web, for a couple reasons.
First, systems vary in which optional features they support. Although most general SGML tools support the SHORTTAG and OMITTAG features, right now Web clients have much more limited, HTML-specific parsers. Even if they are extended to support DTDs other than HTML, they may not learn to handle marked sections, omitted start tags, and other capabilities (or they may not get the harder features exactly correct). By keeping your SGML as simple as possible, you can choose from a wider variety of tools to work with it.
If you avoid minimization, you can even use completely SGML-ignorant tools effectively. A global change from <LI> to <ITEM> is a fine way to change all instances of one kind of element into another (but dont forget </LI> to </ITEM> !). That is, its fine unless you happened to just use <> or omit some start or end tags entirely. The same snags come up with the Find command in a generic editor if you minimize: Searching for <P> doesnt do much good if you left the tag out, to be implied via SGMLs minimization features. If you plan to convert your SGML to HTML for Web delivery, this may be important to think about.
Another reason to avoid minimization is that you may want to be able to ship small pieces of an SGML document around. Theres no guarantee that a piece of SGML can be interpreted right if its taken out of context (the same thing is true in most languages, even English). An SGML document that doesnt use much minimization has a much better chance of being interpreted than one that does minimize. Think about what an SGML parser would have to do if it got an SGML portion like this:
<p>This is a sample/short paragraph</p>
You can probably interpret it right; it sure looks like a paragraph element with a few words in it. And it is, so long as a few things are assumed (besides that delimiters like < havent been redefined in the SGML declaration):
Caution:
Here are SGML examples to show the context problems described. In each case, the <p> isnt really a start tag. In the last example, the final </P> would probably be reported as an SGML syntax error (because the earlier slash ended the paragraph already). You should avoid cases like these in your SGML if you anticipate having servers ship out pieces of it on demand.<revision original="<p>This is a sample/short paragraph</p>"> <![ IGNORE [ <p>This is a sample/short paragraph</p> ]]> <!ENTITY notags CDATA "<p>This is a sample/short paragraph</p>"> ... <EXAMPLE>¬ags;</EXAMPLE> <!-- deleted 4/2/95: <p>This is a sample/short paragraph</P> --> <SEC/ ...<p>This is a sample/short paragraph</P>...
There are not very many possible problems, and you can completely avoid most of them by deciding only to ship pieces that amount to whole elements, and to skip using a few SGML constructs that can have long-distance effects. The ones that pose the most problems for shipping pieces of an SGML file around in isolation are these:
All these structures can have long-lasting effects that change how an SGML parser must interpret the incoming characters. If you avoid them, you can just ship any element out of the SGML stream and it is possible to parse it and get the start tags and end tags right (you do still have to include the DTD and declaration subset, or a way to get them, such as via a URL).
Remember that none of these structures are errors. They are all legal, valid SGML capabilities. If youre working in a generic SGML environment, they should all work just fine (unless the software has a bug, or the author creating the SGML misuses something). The precautions mentioned here are merely guidelines to help make the SGML easier to transport in Web-like environments where you simply cant afford to send entire documents in one fell swoop. Since these particular SGML capabilities are not commonly used anyway, you probably wont have to worry about them.
Previous | Table of Contents | Next |