Special Edition Using SGML:Following General SGML Declaration Syntax

Elements and the ELEMENT Declaration

Many people consider elements to be the fundamental building blocks of the SGML world. As you consider all of the components of a document (such as the title, section, paragraph, illustration, etc.), remember that they usually have one thing in common; they are normally defined as elements. (In some cases—such as tables of content—they are derived from other elements.)

If you have been skimming this section up to now, pay special attention to this examination of elements. Because they are the basic building blocks of SGML, you will encounter elements within your SGML tagged documents often. Because elements are so fundamental to the concept of SGML, it pays to have an in-depth knowledge of what they are and how they are tagged within a document instance.

Does this seem a little vague? If it does, don’t worry. At the end of this chapter (in the section “Some Practical Examples”), you’ll examine a sample DTD and a document instance that corresponds to it. As you examine them, the theoretical should come to life.

The ELEMENT declaration has a few interesting items in it:

As with the other declarations, note that the ELEMENT declaration starts with the familiar Markup Declaration Open (MDO) and ends with the Markup Declaration Close (MDC). You can tell that it’s a declaration for an element because it has ELEMENT as the keyword. The name of the element follows the keyword.

Tag Minimization. The two dashes that follow the element name specify the rules for tag minimization in the SGML document instance. Minimization allows you to omit tags in SGML markup that are unnecessary to the element usage. You’ll learn more about minimization a bit later in this chapter (in the section “Tags and Tag Minimization with Omittag”). For now, know that the two dashes indicate that an element start tag (indicated by the first dash) and stop tag (indicated by the second dash) are required.

An element in which a stop tag is omitted can be defined as follows. Note that to indicate omit, the lowercase letter “o” is used (not zero):

    <!ELEMENT graphic   - o    EMPTY  >

In this case, you have defined the element graphic. The end tag is omitted, and the element has no content. If you’re wondering why you might define an element with no content and no end tag, hang on. You’ll analyze this example further when you examine the use of attributes in “Attributes: Their Use and the ATTRIBUTE Declaration.”

Caution:
You should rarely even consider omitting the start tag in an element. You should never omit both the start and stop tags; your parser will have a difficult time recognizing such an element.

You might want to consider tag minimization in the following circumstances:

• You are entering the tags manually (heaven forbid!)

• People need to regularly read the tagged document instance

• A tag is being used as a placeholder

Defining the Element Content Model. To define the content model of an element, you can proceed in a number of directions. In some cases, your element model might be composed of a number of subelements, perhaps in a particular order.

In other instances, the element might contain only the basic text of the document and no subelements. In this case, the content model definition can contain #PCDATA, which signifies parseable character data. The pound sign (#) signifies a reserved name, which in this case is PCDATA. PCDATA stands for character data, which can be interpreted by the parser to resolve. It is useful to allow the parser to resolve the character data in case you reference other objects, such as entities, within your data. (Entities are discussed in more detail in “Entities: Their Use and the ENTITY Markup Declaration.”)

Sometimes, you might want to have textual content receive special treatment from the parser. In this case, you will use other special data types (indicated by reserved names). Examples of these are shown in table 10.4.

**Table 10.4** Special Declared Content (Reserved Names)

Reserved Word	Name	Typical Usage Identifier

EMPTY	Empty element (without content)	Place holder or processing instruction
CDATA	Character Data	Include valid SGML instructions as a text which will be ignored by a parser and SGML applications
#PCDATA	Parseable Character Data	Textual content data that is evaluated by SGML parsers
RCDATA	Replaceable Character Data	Same as CDATA, except entity and character references are recognized; useful for special notation like equations
ANY	Any content valid	States that any element in DTD (or #PCDATA) is allowed (avoid using this type of declared content!)

Note:
The declared content type #PCDATA includes the pound sign to indicate to the parser that markup within it should be evaluated.

Caution:
The declared content type ANY essentially bypasses document parsers by allowing any type of data content. Because its use bypasses the structure of SGML, it should be avoided. It is sometimes used in the early stages of DTD development.

Element Exceptions (Include and Exclude). The use of element exceptions within your element definition can be thought of as a type of “Yes, but…” statement. Put simply, they allow you to override your (just completed) definition of the element by specifically permitting or forbidding the occurrence of an exception element.

To illustrate inclusion, for example, the element declaration for catalog (shown below) allows the element note to occur anywhere within the catalog element, any number of times, and in any of the subelements:

    <!ELEMENT  catalog   - -    ((section)|(section+,index)) +(note) >

Table of Contents