XML: A Primer:Mortar and Bricks: Document Type Definitions

Table of Contents

Our example for external entities will simply combine two lists of general entities for use in a single DTD. (Using parameter entities to nest more complex DTDs will be covered in later chapters.) It is always a good idea to include comments with external entity declarations—the URLs in SYSTEM identifiers and even the more complete information in PUBLIC identifiers are often cryptic. Our first entity file, companies.pen, includes the following:

  <!ENTITY GLW "Corning Incorporated">
  <!ENTITY IBM "International Business Machines">
  <!ENTITY T "American Telephone and Telegraph">

Our second file, states.pen, includes the following:

  <!ENTITY NC "North Carolina">
  <!ENTITY ND "North Dakota">
  <!ENTITY NJ "New Jersey">
  <!ENTITY NM "New Mexico">
  <!ENTITY NY "New York">

Our DTD, although simple, is also stored in an external file:

  <!--The following entity connects to a list of
  companies using stock ticker symbols as entity
  references. -->
  <!ENTITY % companies "http://127.0.0.1/companies.pen">
  <!--The following entity connects to a list of
  states using   postal abbreviations as entity
  references. -->
  <!ENTITY % states "http://127.0.0.1/states.pen">
  <!ELEMENT DOCUMENT (#PCDATA)>
  %companies;
  %states;

The sample XML file that uses these references reads as:

  <?xml version="1.0" Encoding="UTF-8"?>
  <!DOCTYPE PARAMEXAMPLE SYSTEM "http://127.0.0.1/penex.dtd">
  <PARAMEXAMPLE>
  <DOCUMENT>The company &GLW; is headquartered in
  &NY;, as is &IBM;.  &T; is headquartered in
  &NJ;.</DOCUMENT>
  </PARAMEXAMPLE>

Parsing this should yield the following results:

  <?xml version="1.0" Encoding="UTF-8"?>
  <!DOCTYPE PARAMEXAMPLE SYSTEM "http://127.0.0.1/penex.dtd">
  <PARAMEXAMPLE>
      <DOCUMENT>
                  The company Corning Incorporated is
  headquartered in New York, as is International
  Business Machines. American Telephone and Telegraph is
  headquartered in New Jersey.
      </DOCUMENT>
  </PARAMEXAMPLE>

As we’ll see in later chapters, parameter entities can be a very useful tool for simplifying complex markup and managing multiple DTDs.

Notation Declarations

Notation declarations are an announcement that data from an outside (non-XML) source is needed in the document and helps to pass processing to an application other than the parser. Notation declarations are sometimes used in combination with processing instructions to provide a means of handling nontextual information within a document. The notation declaration tells the processor what kind of information there is; the processing instruction announces what process should be used to handle it. Notation names can also be used as attribute values.

The syntax for notation declarations is similar to the document type declaration:

  <!NOTATION Name ExternalID>

A notation declaration might read:

  <!NOTATION eps SYSTEM "epsview.exe">

The parser does nothing to check the information at the location specified; it just passes the address on to the processing application. If the processing application can handle the information, that’s wonderful. If it can’t, it doesn’t matter to the parser. The SYSTEM keyword is normally followed by a reference to an application that can present the data, but the processing application is definitely not required to use that application. (If a Macintosh or UNIX user was reading this file, a Windows executable wouldn’t help them much anyway). Notations that the processing application cannot understand may be errors, but they aren’t XML errors. The parser will continue its work without announcing an error. The application, of course, may announce its own errors.

Marked Sections in DTDs: IGNORE and INCLUDE

Developers who need to test different structures while keeping track of alternatives may want to use the IGNORE and INCLUDE marked sections in DTDs. (In SGML, these also work in documents, but XML has banished them to the DTD.) IGNORE and INCLUDE let developers turn portions of a DTD on and off. IGNORE and INCLUDE are particularly useful for developers who are combining several DTDs and need to limit the side effects of multiple files colliding, or for developers who need to create a single core DTD with optional subsets. IGNORE and INCLUDE sections may be nested inside other IGNORE and INCLUDE sections, but, like elements, their beginnings and ends may not overlap.

The syntax for IGNORE and INCLUDE resembles that of CDATA:

  <![IGNORE[ declarations ]]>
  <![INCLUDE[declarations ]]>

Neither IGNORE nor INCLUDE may appear in the middle of a declaration; both must address a single declaration or a set of declarations. For example,

  <![IGNORE[<!ELEMENT YUCK (#PCDATA)>]]>
  <![INCLUDE[<!ELEMENT HOORAY (#PCDATA)>]]>

would keep the YUCK element from being parsed and would allow the HOORAY element to be parsed normally. Applied in this way, IGNORE seems like a handy wait to edit out useless parts of a DTD, and INCLUDE seems to be just plain useless. Parameter entities give INCLUDE and IGNORE the power they need to be meaningful additions to the XML vocabulary. Instead of using INCLUDE and IGNORE directly to change code throughout a DTD, developers can use parameter entities to make all those changes in one place. This makes INCLUDE and IGNORE far more convenient and occasionally even necessary. The following example provides a simple demonstration:

  <!ENTITY % invoice "IGNORE">
  <!ENTITY % receipt "INCLUDE">
  <![%invoice; [
  <!ENTITY notice "Please remit the following
  payment within   thirty days.">]]>
  <![%receipt; [
  <!ENTITY notice "Thank you for your prompt
  payment. The sums   below have been collected and recorded.">]]>
  <!ENTITY address "555 Twelvetwelve Lane">

Depending on the values assigned to invoice and receipt, the general entity notice will provide either the voice of a bill collector or a grateful vendor. To change the output, just switch the values of the two entities. The value of the address entity, on the other hand, will be the same in either case. Similar markup could have continued throughout the DTD, with parts inappropriate for receipts being struck. Switching the DTD over to receipts would require editing only two lines of the file rather than demanding a search-and-replace of the entire document. In the next chapters, we’ll explore more uses of this limited but powerful tool.

Table of Contents