Previous | Table of Contents | Next |
Many people consider elements to be the fundamental building blocks of the SGML world. As you consider all of the components of a document (such as the title, section, paragraph, illustration, etc.), remember that they usually have one thing in common; they are normally defined as elements. (In some casessuch as tables of contentthey are derived from other elements.)
If you have been skimming this section up to now, pay special attention to this examination of elements. Because they are the basic building blocks of SGML, you will encounter elements within your SGML tagged documents often. Because elements are so fundamental to the concept of SGML, it pays to have an in-depth knowledge of what they are and how they are tagged within a document instance.
Does this seem a little vague? If it does, dont worry. At the end of this chapter (in the section Some Practical Examples), youll examine a sample DTD and a document instance that corresponds to it. As you examine them, the theoretical should come to life.
The ELEMENT declaration has a few interesting items in it:
As with the other declarations, note that the ELEMENT declaration starts with the familiar Markup Declaration Open (MDO) and ends with the Markup Declaration Close (MDC). You can tell that its a declaration for an element because it has ELEMENT as the keyword. The name of the element follows the keyword.
Tag Minimization. The two dashes that follow the element name specify the rules for tag minimization in the SGML document instance. Minimization allows you to omit tags in SGML markup that are unnecessary to the element usage. Youll learn more about minimization a bit later in this chapter (in the section Tags and Tag Minimization with Omittag). For now, know that the two dashes indicate that an element start tag (indicated by the first dash) and stop tag (indicated by the second dash) are required.
An element in which a stop tag is omitted can be defined as follows. Note that to indicate omit, the lowercase letter o is used (not zero):
<!ELEMENT graphic - o EMPTY >
In this case, you have defined the element graphic. The end tag is omitted, and the element has no content. If youre wondering why you might define an element with no content and no end tag, hang on. Youll analyze this example further when you examine the use of attributes in Attributes: Their Use and the ATTRIBUTE Declaration.
Caution:
You should rarely even consider omitting the start tag in an element. You should never omit both the start and stop tags; your parser will have a difficult time recognizing such an element.
You might want to consider tag minimization in the following circumstances:
Defining the Element Content Model. To define the content model of an element, you can proceed in a number of directions. In some cases, your element model might be composed of a number of subelements, perhaps in a particular order.
In other instances, the element might contain only the basic text of the document and no subelements. In this case, the content model definition can contain #PCDATA, which signifies parseable character data. The pound sign (#) signifies a reserved name, which in this case is PCDATA. PCDATA stands for character data, which can be interpreted by the parser to resolve. It is useful to allow the parser to resolve the character data in case you reference other objects, such as entities, within your data. (Entities are discussed in more detail in Entities: Their Use and the ENTITY Markup Declaration.)
Sometimes, you might want to have textual content receive special treatment from the parser. In this case, you will use other special data types (indicated by reserved names). Examples of these are shown in table 10.4.
Reserved Word | Name | Typical Usage Identifier |
---|---|---|
EMPTY | Empty element (without content) | Place holder or processing instruction |
CDATA | Character Data | Include valid SGML instructions as a text which will be ignored by a parser and SGML applications |
#PCDATA | Parseable Character Data | Textual content data that is evaluated by SGML parsers |
RCDATA | Replaceable Character Data | Same as CDATA, except entity and character references are recognized; useful for special notation like equations |
ANY | Any content valid | States that any element in DTD (or #PCDATA) is allowed (avoid using this type of declared content!) |
Note:
The declared content type #PCDATA includes the pound sign to indicate to the parser that markup within it should be evaluated.
Caution:
The declared content type ANY essentially bypasses document parsers by allowing any type of data content. Because its use bypasses the structure of SGML, it should be avoided. It is sometimes used in the early stages of DTD development.
Element Exceptions (Include and Exclude). The use of element exceptions within your element definition can be thought of as a type of Yes, but statement. Put simply, they allow you to override your (just completed) definition of the element by specifically permitting or forbidding the occurrence of an exception element.
To illustrate inclusion, for example, the element declaration for catalog (shown below) allows the element note to occur anywhere within the catalog element, any number of times, and in any of the subelements:
<!ELEMENT catalog - - ((section)|(section+,index)) +(note) >
Previous | Table of Contents | Next |