Previous Table of Contents Next


Why Structure, Content, and Format Are Important in SGML

SGML provides a mechanism for delivering documents or structured information in a number of ways, including a mechanism for delivering the same content and structure through different programs, formats, and delivery media. To do this efficiently, SGML must define the structure and separate the content from the delivery-specific format.

Through this approach, the actual document—its content and structure—becomes mobile. That is, it can be used with different programs, on different computers, and in different formats across time. With a document repository stored in SGML, your organization can switch to a new word processor with a minimum of pain. The documents that you created years ago can be reused with a minimum of effort even though you might have switched word processors twice since then.

For example, imagine a company that prepares a series of product advisory bulletins. It distributes the bulletins as printed hard copy and in electronic form. To support this, it applies formatting differently, depending on the delivery approach.

Now examine these advisories in the two worlds—hard copy and electronic delivery. You should begin to understand the benefits of SGML.

The printed version of the advisory is shown in figure 2.2. It is a typical document with a standard logical organization designed to communicate specific information.


Fig. 2.2  This printed version of a product advisory bulletin is an example of a typical hardcopy document.

Indicating Structure Through Visual Cues

Printed documents typically indicate structure through visual hints, or cues. By using various formatting techniques, documents can “suggest” their structure.

As you view the product advisory, your mind divides the document into pieces. The title tells you what the document is. Identification data in the upper-left corner identifies specific values associated with the document. The body of the document consists of two separate sections, each of which is identified by a section title.

You can distinguish the sections from their visual cues. For example:

  The larger typeface for the document title
  Captions—Number, Type, Date, Revised, and Subject—and boldface type for the identification data
  Space separating the body of the document from the identification data

Unfortunately, this structure is apparent only when the formatting is available. If you translate this document into simple text without the formatting, the structure becomes more difficult to identify. Figure 2.3 shows how the document looks without formatting.


Fig. 2.3  Here is the same product advisory bulletin viewed as unformatted text.

In the unformatted version, the document structure is not as obvious as it was in the document with formatting. Because document structure in word processing documents is often only “hinted” at through formatting, you can be highly dependent upon such formatting to understand the structure.

Losing Structure in Word Processing Documents

Figure 2.4 shows a word processor’s view of the fully formatted document from figure 2.2. In this example, the formatted version is stored in rich text format (RTF), a word processing interchange format.


Fig. 2.4  Here the product advisory bulletin is viewed in Rich Text Format (RTF), a document interchange format. Notice that many formatting instructions are embedded within the textual content of the document.

The RTF formatted version of the document uses codes to indicate formatting. These codes are often specific to a software package or vendor. Even with RTF, how the codes are interpreted varies among software packages.

By specifying document structure through formatting hints, you become dependent on format. What happens if you lose this format? That might occur when you translate a highly structured document from one word-processing format to another. The text will likely translate correctly, but the formatting might translate less accurately. If this translation process is particularly troublesome, the formatting might end up in a jumble, making the document extremely difficult to read. Important parts of the structure could be lost.

Likewise, suppose that you want to view the document in a different medium. To display electronically—for example, on the World Wide Web—a document that you created in a word processor, you must convert its format. See Chapter 26, “Tools for the PC: Authoring, Viewing, and Utilities,” for a look at some powerful new SGML authoring tools.

An Alternative View

Figure 2.5 shows the product advisory bulletin presented in electronic format. The content is the same, but the formatting has been altered to suit the electronic environment in which it is presented.


Fig. 2.5  Here is the product advisory bulletin presented electronically, with the format altered to suit the electronic environment.

Structural Views of Information

Now consider the product advisory bulletin from a structural perspective. You can break it into its parts. One way to do this is to lay out what it contains in the format of an indented, or structured, list. For example:

     Product advisory
       Identification number
       Advisory type
       Date
       Revision date
       Subject
       Body
         Subsection
           Title
           List
         Subsection
           Title
           List

This is how SGML looks at documents. One of the advantages of requiring a structured view of data is that SGML makes it easier to use data in a wide variety of environments.


Previous Table of Contents Next