XML: A Primer:Mortar and Bricks: Document Type Definitions

Document Type Declarations

After the opening prolog has announced that this is an XML document, the document type declarations announce what kind of XML document it is to be. Document type declarations glue the DTDs to the actual document or may even provide their own declarations about the structure of the document. Although internal DTDs (which are declared in the document they apply to) can be used and may be appropriate for certain situations, using an external DTD is usually preferable. Keeping DTDs separate makes them considerably more reusable and assures document managers that developers aren’t taking liberties with the DTD to suit their own purposes, creating incompatible document types.

Whenever the acronym DTD appears in this book or in other XML document, always assume that it stands for document type definition, not document type declaration.

A document type declaration always begins with <!DOCTYPE, followed by the name of the DTD, followed by a declaration of the DTD or a link that points to where the DTD can be found, and finally a > to close the declaration. The name of the DTD doesn’t need to correspond to the file name of the DTD file specified, but it should convey some sort of intelligible description of what the DTD is for and match the root tags of the document. After the name, the declaration can either provide a DTD within the declaration itself, enclosed in braces ([ ]), which we’ll do in some of the following examples, or provide a link to a file containing the DTD. The reference to the DTD file is an external entity; external entities will receive additional coverage later in the entities section. For now, we’ll just discuss how to apply them in this situation.

Some DTDs are public standards, available in standardized format to a large number of users. Others are locally developed, useful for a Web site, a business, or perhaps a small industry. For the first type, the PUBLIC keyword is more appropriate; for the second, the SYSTEM keyword should be used. The PUBLIC keyword first provides a public identifier (in quotes) that the parser can use to locate the standard if it is connected to a library of standards. Following that is a URL (also in quotes) that can lead the parser to a copy of the DTD. The SYSTEM keyword is followed only by the URL. Large document management systems may well have libraries of DTDs available to parsers, but developers of other types of projects may not have such resources. The following two document type declarations link the document to the same DTD, but the first also provides a public identifier for the DTD.

  <!DOCTYPE manual PUBLIC "-//loopbackInc//DTD manual//EN"
  "http://127.0.0.1/manual.dtd">
  <!DOCTYPE manual SYSTEM "http://127.0.0.1/manual.dtd">

DTDs may also be nested—one DTD file may call another. DTDs are cumulative, although an internal DTD will always have precedence over an external DTD.

The public identifier structure uses the same format as SGML public identifiers. If the entity or DTD described is an ISO standard, the identifier starts with “ISO.” Otherwise, the first character is a plus (+) if the standard is officially approved by a standards body, or a minus (–) if it is not, followed by two forward slashes (//), after which an identifier of the owner of the DTD appears. After two more slashes the type of the document (DTD, for example, or TEXT) appears followed by whitespace and the name of the document. After yet another two slashes, the language identifier appears, using the codes specified in ISO 639 (EN, for example, is English).

Comments

Comments are another critical part of XML. They appear in both documents and DTDs. Comments in XML behave much like comments in HTML. Comments begin with . XML comments are not allowed to have two consecutive dashes (--) in their content because it may confuse parsers that interpret that as the end of an SGML comment. XML comments can appear in both documents and DTDs. XML comments are not allowed to appear inside of tags or in declarations and will not work in CDATA. (In CDATA sections, the comment symbols are treated as regular characters and will appear as part of the document.) The parser ignores the contents of comments. The following is a sample comment:

  <!--This is a comment.  Please ignore me if you are   parsing.-->

Comments in XML are more likely to be used in DTDs than in documents, but they are critically important there. Comments are the signposts future editors will need to understand the structures you have created. Comments can explain otherwise mysterious entity references and are useful for labeling declarations, especially if element names are abbreviated. Comments may seem like wasted space to developers with perfect memories, but a DTD without comments is truly wasted space to the next developer who must work with it.

Table of Contents