XML: A Primer:Mortar and Bricks: Document Type Definitions

Table of Contents

The CONTENT element can stay mixed up because every article is bound to vary, but the rest of the document receives considerably more structure.

Mixed-content declarations are the final option. Technically, the (#PCDATA) content used in most of the examples in this chapter is a mixed-content declaration all by itself. Mixed-content declarations can allow multiple elements to appear as child elements without requiring them to appear or making any specific demands on the sequence in which they appear. The simplest mixed-content declaration is one of the most frequent: declaring content to be PCDATA so that text and entities (but no other elements) may appear:

  <!ELEMENT ORDINARY (#PCDATA)>

The ORDINARY element can now contain text or entity markup in any combination. XML offers only PCDATA for elements that need to contain text (and not just other elements). Consequently, this basic declaration will be used for nearly any leaf element. (Leaf elements have parent elements but no child elements; figuratively, they’re the branches farthest out on the tree, where the action actually takes place.) In some situations, however, developers may want to allow other elements to appear in a leaf element. Not all leaf data are appropriate to every element. An ingredient, for instance, shouldn’t contain a table of contents, but it might contain a note. The declaration creating an element that could hold an ingredient description and/or a note would look like:

  <!ELEMENT INGREDIENT (#PCDATA | NOTE)*>

This declaration would permit INGREDIENT elements to contain the textual information they need to present ingredients, as well as NOTE elements to explain ingredients that are strange or difficult to find.

DTDs that include a significant set of child elements that can be used in multiple parent elements can be simplified with parameter entities listing the elements. The parser should parse the parameter entity and add its markup to the element content declaration.

  <!ENTITY % parts "prologue | detail | moral | punchline | joke">
  <!ELEMENT STORY (#PCDATA | %parts;)*>
  <!ELEMENT TALE (#PCDATA | %parts;)*>
  <!ELEMENT FABLE (#PCDATA | %parts;)*>

In this case, STORY, TALE, and FABLE elements can contain text and any of the PROLOGUE, DETAIL, MORAL, PUNCHLINE, or JOKE elements. These subelements may appear in any order and any number may appear. All other elements are prohibited from appearing in a STORY, TALE, or FABLE element.

The parameter entity could also include the parentheses and the #PCDATA declaration. Each approach has its advantages in a different situations.

Attributes

Attributes have provided much of HTML’s power, but they will probably be used somewhat more sparingly in XML. Attributes are most useful for storing information holding more interest to computers than to humans. Even in HTML, attributes held critical formatting information for the browser, not information about the contents of the element. Attributes remain a key part of XML, however, offering flexibility beyond that of elements, and solidifying underlying structures. Attribute declarations use some syntax similar to element declarations but tend to offer more precise definitions of the content they allow.

Attributes are defined using the following syntax:

  <!ATTLIST ElementName
       AttributeName Type Default
       (AttributeName Type Default...)>

The first value in an attribute declaration is the name of the element to which the attributes apply. Although it makes a DTD more readable to include the attribute declaration right after the element declaration, this is not required. In fact, there can be multiple attribute declarations for the same element; all declarations for that element will be combined into one large set. If the same attribute is declared multiple times in that set, only the first appearance will be used. This makes it easy to extend existing DTDs without having to change them drastically.

After the element is named, an attribute definition or a list of attribute definitions may follow. A definition consists of the name of an attribute, its type, and its default value (or a specification for that value). Names of attributes must obey the same rules as names for entities and elements: it must contain only letters, digits, periods, dashes, underscores, and colons. Attribute types are quite unlike the structures explored so far and define the kinds of data permitted in an attribute when used in an element instance. (An element instance is just a use of the element in the document.) Table 5.4 lists all the acceptable values for attribute types.

**Table 5.4** Attribute Types
	Type	Explanation
	CDATA	The attribute may contain only character data. (Markup will never be interpreted in attribute values.)
	ID	The value of the attribute must be unique, identifying the element. If two attributes within a document of type ID have the same value, the parser should return an error.
	IDREF	The value of the attribute must refer to an ID value declared elsewhere in the document. If the value of the attribute doesn’t match an ID value within the document, the parser should return an error.
	ENTITY, ENTITIES	The value of an ENTITY attribute must correspond to the name of an external binary entity declared in a DTD. ENTITIES is similar but allows multiple entity names separated by whitespace.
	NMTOKEN, NMTOKENS	The value of the attribute must be a name token much like CDATA, but the characters used in the value must be letters, digits, periods, dashes, underscores, or colons. NMTOKENS is similar but allows multiple values separated by whitespace.
	NOTATION	The value of the attribute must refer to the name of a notation declared elsewhere in the DTD.
	Enumerated, e.g. (thisone \| thatone)	The value of the attribute must match one of the values listed. Values must appear in parentheses and separated by OR (\|) symbols.
	NOTATION (enumerated)	The value of the attribute must match the name of one of the NOTATION names listed. For example, an attribute with type NOTATION (picture \| slide) would need to have a value of “picture” or “slide”, and NOTATION declarations would need to exist for both picture and slide.

Table of Contents