Special Edition Using SGML:Practicalities of Working with SGML on the Web

Table of Contents

You don’t have to wait for existing HTML viewers to add SGML support, and you don’t have to add network capabilities to SGML viewers. You just get the helper application and tell the HTML viewer that it’s there. On the other hand, you have two viewers to manage that are doing almost the same thing. The formatting capabilities of your HTML and generic SGML viewers may differ; their interfaces certainly will. You might have to click a different mouse button to follow links depending on whether you’re viewing a document in HTML or in a different DTD, and use different menus for all the functions.

This problem would be minor if HTML and generic SGML document viewers “felt” different to the user. However, the differences between HTML and other SGML documents are often not evident just from formatting. They both typically end up as wrapped text with various fonts, colors, indents, and so on (although SGML viewers tend to provide a wider variety of extra navigational tools, such as tables of contents). Given the similarity, users will probably wonder why they have a different interface for what looks like pretty much the same thing.

On the other hand, the differences between a viewer for JPEG graphics and an HTML viewer are obvious to users; they know right away that those two viewers are different, and probably won’t be surprised at a slightly different set of menu items, cursors, and so on. Because the two applications are different, they operate differently.

A similar problem with the “helper application” approach is that the HTML viewer and the generic SGML viewer have separate history lists of places you’ve been. So when you click the Back button, you won’t get the same results from each application. This difference in operation may be confusing, since when you follow a link, you never know if the result will appear in the same window with the same interface, or will be somewhere completely different.

The problem is complex. With a helper application like a JPEG viewer, all the HTML client has to do is forward the JPEG data to it. But an SGML file may refer to other data that is still living somewhere else on the network. For example, the SGML viewer may need to get the DTD or a bunch of entities that are included by reference from the initial SGML file. Because the SGML viewer in this approach can’t go out to the network to fetch data, it has to be able to send a signal back to the HTML client, which gets the data and then sends it back to the SGML viewer. This kind of two-way communication between separate programs is a little tricky on current systems, though it can definitely be done.

There is a product that serves as an SGML helper application in much this way. Panorama is available from SoftQuad for MS Windows. A free version is available from http://www.oclc.org:5046/oclc/research/panorama/panorama.html.

• See “Other SGML Tools,” p. 453

Panorama Pro is a commercial version of the same system. Both can read SGML regardless of the DTD used, and you can set up most HTML-based Web clients to send SGML data to them.

Although the solutions already described (SGML clients and SGML helper applications) are much better solutions because the advantages of SGML survive at the client end, there are also advantages to just sending HTML across the Net.

• Readers can use any client that supports HTML (although they can get only the client formatting and search functions that HTML can support).

• Publishers can avoid sending their “real” data across the Net, thus making copies distinct from originals. Users can download and save the HTML, but the publisher can still have added value in the original data, which they don’t have to give away.

• Because the data at the client end is HTML, the problem of having a different interface in a helper application disappears.

• Because translation is going on, publishers can choose to do multiple translations and turn their original SGML data into HTML 2.0 for some clients, HTML 3.0 for others, HTML “à la company X” for browsers that have nonstandard features, text-only for Lynx, and so on.

• For the same reason, servers can customize the data (sort lists, hide overviews or excessive detail, and so on). They just need to leave some of the SGML out entirely, or use different HTML tags for it to create very different views.

Integrated SGML Converters

The first group of SGML to HTML tools converts data on the fly, as part of the file-serving process. So publishers author their data in any SGML DTD using whatever tools they normally use and simply install the SGML on the server. Then they set up a conversion filter that maps their SGML data from its original DTDs into HTML. They don’t run that filter program ahead of time (except to test it!). Instead, whenever the server gets a request and sees that the data is in SGML, it fires up the converter and ships out the filtered result. Figure 20.3 illustrates the process.

Fig. 20.3 An integrated server (or server add-on) converts SGML to HTML on the fly.

Suppose a software company wants to put its documentation on the Web. A lot of software companies use the “DocBook” standard DTD for their manuals; to use an integrated SGML conversion server, they would specify one conversion table that the server would apply to any of the data. Then, when a request comes in, the server parses or retrieves the document and converts the elements it finds according to the table.

Note:
Information about the DocBook DTD is available on the Internet from http://jasper.ora.com/DocBook.

An HTML conversion table for DocBook might look something like this (only one table is needed for each DTD—not one for each document):

    # DocBook tag           HTML tag
    para                    p
    orderedlist             OL
    listitem                LI
    chapter                 [untag]
    section                 [untag]
    chapter/title           H2
    section/title           H3

Table of Contents