Previous Table of Contents Next


Chapter 2
SGML View of the World: Structure, Content, and Format

SGML is a language for relating structured information. If you want to create documents and transmit them in hard copy, electronically, or even in Braille, SGML is worth considering. If you are creating or packaging data that you expect to use for 20 years, SGML offers the capability to take your information with you, no matter what software package you will be using 10 years from now.

By design, SGML is suited to complex, technical information traditionally packaged in documents. Through its ways of identifying and defining document components, SGML provides a rich environment for transmitting structured information sets—or documents—in many ways through a variety of media. SGML is not really too complex, although it provides a structure for defining and exchanging complex information. In other words, it can be as complicated or as simple as the documents that use it.

Because most people are familiar with creating and editing documents with word processors, the examples throughout this chapter contrast the concepts common to word processing programs with SGML concepts. Understanding the three central concepts of SGML document structure is essential. Some people say that when you understand them, the rest is easy.

In this chapter, you learn some of the key components of SGML, including:

  How SGML views documents (structure, content, and format)
  Why this view is important
  How SGML works with this view

Structure, Content, and Format

To understand SGML, you must understand some basic ideas about document structure. Central to SGML is the concept that documents have structure, content, and format (see fig. 2.1). These three ingredients combine to form a document. They interrelate in subtle ways, and you can easily confuse them as you work with documents.


Fig. 2.1  Structure, content, and format play important roles in the construction of an SGML document.

How you approach a document when you create one on a word processor usually adds to the confusion in understanding the distinctions among these three concepts. For example, when you write a technical memorandum, you might use special formatting, such as a bold typeface, to indicate elements of the document structure.

Later in this chapter, you look at a document created with a word processor. As you examine it, you can contrast a word processing program’s view of structure, content, and format with the SGML approach.

What Is Structure?

Structure is the blueprint or plans of a document. It consists of the document’s logical organization. The structure defines how the document is laid out and in what order the elements are assembled. In the SGML world, it also defines relationships, indicating how individual information objects relate to one another. For example, a bicycle assembly manual might consist of the following sections in this order: an introduction that describes the document and lists the manufacturer’s address, assembly instructions, a parts list, instructions for order replacement parts, troubleshooting advice, and an index.

Within a word processing program, for example, structure is generally left up to the author. Styles and stylesheets help, but structure is usually not explicitly supported. SGML departs from this philosophy in that it requires the author to define a document structure explicitly ahead of time. The author, however, can decide how strict—or lenient—the structure will be.

For example, the document structure for an office memorandum might contain a title, an addressee block, a sender block, the date, a salutation, and the body of the memo. When you create this memo with a word processor, the structure is implied. In SGML, the structure must be defined.

What Is Content?

Content is the actual data within a document. The words and illustrations that make up a bicycle assembly manual are its contents, just as the text and figures in this chapter are its content. Different word processors handle textual content similarly, but they vary in how they support illustrations and graphics.


Note:  
SGML does not directly address the issues involved in supporting graphics. It is a structured text markup language, but provides the capability to support non-text objects, such as graphics. With the widespread adoption of electronic documents, the importance of supporting multi-media objects, such as video and sound clips, is increasing.

Underlying most word processors, the textual data is stored in a specific way—in a binary format or in a version of ASCII text. In most instances, it is stored sequentially—word after word, just as the author intended. Only in the case of specialized formatting does it appear different from what the author had in mind as he entered the information.

What Is Format?

Format consists of how the words, sentences, and paragraphs are visually presented and distinguished from one another within a document. Boldface for titles, italics for special terms, and blank lines between sections are examples of document formats. Specific formatting serves a variety of uses—emphasis, hints on structure, and the overall aesthetics of the document.

People often confuse format with structure. This is because structure is commonly implied within word processing programs through the use of specialized, tailored formatting called styles. By using defined styles, an author can specify that a style name should be associated with a block of content and special formatting.

People sometimes forget what they are using specific formats for, and then use the same type of formatting to indicate multiple things. Novice users of word processors often use a wide variety of fonts, font sizes, and other formatting attributes in their documents. While visually interesting, their documents can be hard to read and difficult to understand.

Different word processors handle formatting issues in widely different ways. As a result, translating from one format to another can be an adventure in large, complex documents.


Caution:  
Using formatting to indicate structure is both commonplace and dangerous. Because people tend to make assumptions as they read, they can easily misinterpret a document’s content. This is particularly true with documents that have been translated from one format to another, such as from one word processor format to another. It is also a source of agony when you attempt to translate a document from one proprietary format to another.


Previous Table of Contents Next