Special Edition Using SGML:Defining the Elements

Structure Diagrams

If you like drawing pictures, you will love structure diagrams. They help you understand the structure of documents because they are concrete and visual. Drawing them engages the creative side of your brain. Just remembering what elements come out of other elements can get abstract. Structure diagrams tell you in an instant where elements fit in terms of hierarchy, order, and occurrence, and they make it much easier to construct the DTD for a document. Figure 7.3 gives you an idea of what a structure diagram can tell you.

Fig. 7.3 This is one representation of a structure diagram.

Figure 7.3 does not indicate how many times a middle initial can occur in each listing or whether the city must appear. You add that information later with occurrence symbols.

Grouping elements is a good habit to get into. Two large content models that include smaller elements have been added in figure 7.3—the address and the telephone number.

• See “Standardizing Basic Components,” p. 194

Tip:
Grouping under large shells often makes the small shells easier to manage. It also helps in developing a common DTD.

The structure diagram in figure 7.3 does not indicate how many of each element may appear or whether any elements must appear. You can add occurrence symbols to the structure diagram to show how often each element occurs. Table 7.1 describes the occurrence symbols.

**Table 7.1** Occurrence Symbols

Occurrence	Symbol

Once and only once (required)	(none)
Optional	?
Optional repeatable	*
Required repeatable	+

Whenever subelements live inside one of your elements, draw a structure diagram and use the occurrence symbols, as in figure 7.4.

Fig. 7.4 Adding occurrence symbols to structure diagrams makes them more meaningful.

You can look at documents in many ways. SGML beginners often confuse structure with content. For now, you should concentrate on logical structure, not content. Take the following table, for example. You can think of this table either by logical structure or type of content. The safest way is to think of logical structure (see fig. 7.5).


Part Number	Part Description

157923C1	Nut, wing, 0.75
923872C3	Washer, 0.75
923579C2	Stud, SS, 1 1/2×0.75
47623-100	Lockwasher, 0.75

Fig. 7.5 Keep structure and content straight as you define your elements. This is one of the biggest stumbling blocks for new SGML users.

Types of Data Content

You have learned that elements can contain other elements. What happens when no more elements are left? What do the final elements contain? The answer is data. Table 7.2 describes the types of data that elements can contain.

**Table 7.2** Types of Data That Go into SGML Elements

Data Type	Meaning

`CDATA`	Character data
`RCDATA`	Replaceable character data
`#PCDATA`	Parsed character data
`SDATA`	Specific character data
`NDATA`	Non-SGML data
`EMPTY`	No data—empty element
`ANY`	Any type of data

Note:
#PCDATA is element content that contains data that should be parsed. It contains no other elements. These characters are recognized as data content rather than as markup.
CDATA consists of valid SGML characters that will not be parsed. This should be used for data that is specific to a unique processing system, like data content for a specific external application.

RCDATA is treated just like CDATA, except entity references are replaced.

Note:
Most straight text is #PCDATA.

In the following element declaration, for example, all the para elements contain parsed character data. The # is known as the SGML reserved name indicator, or RNI.

    <!ELEMENT para (#PCDATA)>

Empty data is useful for elements that you expect to use at a later time. Suppose, for example, that you want to add revision tracking to numbered list elements, but your boss does not want to do it yet. You know that he will want to add it later on, so you create an element and fill it with empty data. SGML holds a place for it. The data must remain empty, though. You cannot put anything in it. The declaration looks like this:

    <!ELEMENT numlist ((text|graphic), revtrack)
    <!ELEMENT text #PCDATA>
    <!ELEMENT graphic EMPTY>
    <!ELEMENT revtrack EMPTY>

Whenever the parser runs into <revtrack>, it considers it empty. The <graphic> element consists of an EMPTY content marker. That is because photographs are represented by encoded binary steams, which the parser does not know how to handle. You mark the element EMPTY, so that the parser treats it as though it is not there.

Table of Contents