Previous Table of Contents Next


Getting Back Out to HTML

Once you have SGML, you’re in good shape to convert to almost anything else, and HTML is an easy target. In the simplest case, you just do a bunch of global changes to rename tags from the names in your DTD to the names in HTML; for elements that occur in all kinds of documents this works just fine: lists, paragraphs, headings, and so on.


Caution:  
The first time people do this, they usually miss a few cases. Here are a few to watch out for if you’re using software that doesn’t really know SGML:
  Remember to allow for attributes in start-tags—you can’t count on the “>” being right after the tag name.
  Be very careful if you use minimization. A tag could have either pointy-bracket missing, the element type missing, or the tag might be completely missing if you use OMITTAG (in that case, it might be a little hard to write a global change to catch it!).
  If you’re changing between two different DTDs, be careful that the element types you change to are allowed in all the places they’re used. Otherwise, the parser will either report an error (which makes the problem easy to find) or quietly recover by closing elements until it finds one where the new element is allowed (which can make it a lot more difficult to find).
  If you’re converting HTML that you haven’t run through an SGML parser, watch out for URLs that aren’t quoted.

Usually when you convert from SGML to HTML, you end up throwing distinctions away, such as converting three different element types to just be tagged as italics (<I>) instead. This makes it very important to think of the SGML form as the “real” document, and keep it around for later when you may want to do a slightly different conversion on it.

See the previous chapter, “Practicalities of Working with SGML on the Web,” for more details on tools for converting SGML to HTML. In particular, some Web servers do it on the fly, which is a big advantage for data management and overall flexibility. There are too many standalone tools to mention (Perl is one of the most popular and portable)—many useful tools are discussed at www.undergrad.math.uwaterloo.ca, for example /~papresco/private/calibre/sgml/tools/sgml2html.html).

As time goes on, the need to translate will shrink (it might eventually go away completely). It’s not that much more difficult to program a browser that can accept any tags at all than one that can accept only HTML. The main addition is the need to access and read some kind of stylesheet that says what to do with each given tag. Panorama and DynaText have already proven that this approach works, and further solutions will continue to appear.


Note:  
C. M. Sperberg-McQueen and Robert F. Goldstein wrote a wonderful paper on the potential for extending Web clients this way, with the imposing title “HTML to the Max: A Manifesto for Adding SGML Intelligence to the World-Wide Web.” You can get it from www.ncsa.uiuc.edu in the file SDG/IT94/Proceedings/Autools/sperberg-mcqueen/serberg.html.

Getting Back Out to Print

In addition to Web delivery, you may want to provide printed output. Web browsers can do some level of draft printing on demand, but most are quite limited. For example, most, if not all, HTML-based Web browsers will not number your printed pages well, much less give you flexible control over page headers and footers, news-paper-style multi-column layouts, complicated tables, footnotes, and so on. Because of this, going through HTML as a way to print SGML isn’t very effective.

SGML authoring systems can do quite nice printing, so if you have your data in one of them, you may be in fine shape. However, right now the most sophisticated print formatting tools don’t directly accept SGML (some, like PageMaker, can read some limited SGML-like tagging). If you need high-end printing and typesetting capabilities, you will need to move the SGML data into a special paper-production system.

Remember that it’s much easier to convert SGML to other forms, than other forms to SGML. That puts you in a strong position if you have SGML. You can probably get to any typesetting system you want without too much pain. Once you do that, your data will be in a system that book production specialists already know. They don’t have to adopt or learn something new, and they can focus all their attention on getting you the best-looking result (of course, “they” might be “you” in many cases).

Here are some of the formatting capabilities that (if you need them) could force you to move to a specialized solution:

  Sophisticated footnote management (say, where footnotes might break across two pages, or require unusual numbering)
  Widow control (preventing the first or last lines of paragraphs from getting left on a separate page)
  Hyphenation (especially sticky cases like long, unusual, or foreign words)
  Special layouts such as tables, equations, text flowing around odd-shaped graphics
  Floating graphics (that automatically shift up or down relative to the text, to make for the best page layout)
  Book-level tools to build tables of contents, indexes, fancy title pages, and so on

For this level of features, you’re best off moving your SGML into typesetting software such as QuarkXPress, PageMaker, TeX, or something similar. These tools are focused on doing one specific thing well, and so will do a better job at it than more general tools (which also have to devote effort to intuitive editing, search and retrieval, and so on).


Previous Table of Contents Next