Previous Table of Contents Next


SGML Resources That Translate to HTML

There is software that translates SGML document type instances into HTML instances. More often, an HTML version of a page exists alongside other document type instances. For example, both an SGML and an HTML version of the SGML Open House home page exist. The address is:

http://www.sgmlopen.org/

There’s also the case in which an SGML document type instance is translated to an HTML Web page document instance on demand, the moment you access it. Common Web browser software can then view it easily. This capability is like a word processor being able to create electronic ASCII or text documents. If the only type of electronic file a word processor could create was its own proprietary format, it probably would not sell well, because only people with the same word processor could electronically share files. The capability to create an HTML file is pretty convenient because the HTML DTD is so popular and widely used. Creating an HTML document on demand is not too different from saving a word processing document as an electronic ASCII or text file.


Note:  
It’s common among SGML people to say that saving a document as HTML is like saving a word processing document in its proprietary format rather than as a text or ASCII document. The idea behind SGML is to enhance document sharing irrespective of software or hardware.

This is a limited analogy. Nearly as many people can view an HTML document now as could view a text or ASCII document in years past. HTML browsers are nearly as commonplace as text editors nowadays. So if you save a document as an HTML document, lots of people can view it using their favorite Web clients. That’s why allowing SGML document instances for various DTDs to translate on the fly into HTML is like saving a word processing document as an ASCII or text document.



  See “Conversion Between SGML Document Types,” p. 279

Chapter 15, “Automatic versus Manual Tagging,” talks about tools that translate document instances from one DTD into instances of another DTD. That’s essentially what happens with these Web pages.

As with all document conversions, the trick is to maintain the original flavor of the document as it gets squeezed into the HTML mold. Some documents have many features that are not supported as well in HTML as they are in their native DTD. Documents with tables and equations, for example, frequently suffer when they are translated.


Note:  
Document conversion can be a challenging task since some document types require structures that cannot be supported in other DTDs. Scientific papers with extensive tables, equations, and special types of footnotes, for example, would be difficult to translate into HTML because these features are not fully supported under the HTML DTD. It is possible to translate some or many structures into comparable structures, but what is often lost is the “flavor” of the original document, or the subtle ways of presenting the data that the author originally created, even though the raw data remains intact.

To see an example of documents that are translated on demand, check out the Electronic Text Center at the University of Virginia. The address is:

http://etext.virginia.edu/modeng.browse.html

This is a wonderful resource of literature. To have full access to this electronic library, you must make special arrangements with the university, but you can read the public domain classics that are online. The texts were originally marked up according to the TEI DTD, but they translate to HTML on demand (see fig. 18.2).


Fig. 18.2  The Electronic Text Center at the University of Virginia translates titles from TEI instances into HTML.

The advantage of translating into HTML on demand might be short-lived. In the long run, many of these documents will outlive their original medium, just as the works of Shakespeare have outlived their original published volumes. By then, SGML will be more common than HTML.


Note:  
You can check out the complete works of Shakespeare in SGML at:
http://www.oclc.org:5046/oclc/research/panorama/contrib/Shakespeare/index.html

This link is an HTML index page. When you click a play, it comes up in SGML.


Converting to HTML is popular now, primarily because of the dearth of native SGML Web browsers. Panorama enables your Web browser to read and parse SGML document instances under their native DTDs. This is useful because you are not confined to a single DTD. You can handle tables and equations—or any other challenge—with all the flexibility of SGML.


Previous Table of Contents Next