XML: A Primer:Mortar and Bricks: Document Type Definitions

Table of Contents

The main change, apart from the addition of a few elements, is in the AUTHOR element declaration: <!ELEMENT AUTHOR (FIRSTNAME, LASTNAME, (UNIVERSITY | COMPANY)?)>. This declaration allows developers to create somewhat more flexible structures. In this case, the AUTHOR element must include (in this order) a FIRSTNAME element, a LASTNAME element, and either a UNIVERSITY element or a COMPANY element. (Using both elements will produce a parsing error.) MSXML seems happy enough about this arrangement:

    C:\msxml>jview msxml -d -i http://127.0.0.1/simp4.xml
    <?XML VERSION=”1.0”     ENCODING=”UTF-8”?>
    <!DOCTYPE SIMPLE [
        <!ENTITY Description ‘This is a very simple
         sample document.’>
        <!ELEMENT TITLE PCDATA>
        <!ELEMENT UNIVERSITY PCDATA>
        <!ELEMENT AUTHOR
         (PCDATA|((FIRSTNAME,LASTNAME),(UNIVERSITY|COMPANY)?))>
        <!ELEMENT COMPANY PCDATA>
        <!ELEMENT FIRSTNAME PCDATA>
        <!ELEMENT SUMMARY PCDATA>
        <!ELEMENT LASTNAME PCDATA>
        <!ELEMENT DOCUMENT (((TITLE,(AUTHOR,AUTHOR*)),SUMMARY*),NOTE?)>
        <!ATTLIST DOCUMENT
            TRACKNUM #REQUIRED
            SECLEVEL ( )>
    ]>
    <DOCUMENT TRACKNUM=”1234”>
        <TITLE>
        Sample Document
    </TITLE>
    <AUTHOR>
        <FIRSTNAME>
            Simon
        </FIRSTNAME>
        <LASTNAME>
            St.Laurent
        </LASTNAME>
        <COMPANY>
            XML Mania
        </COMPANY>
    </AUTHOR>
    <SUMMARY>
        This is an entity inside an element:This is a
 very simple sample document.
        </SUMMARY>
    </DOCUMENT>

At this point, the document is getting very long, and most of it is just defining the document type. For our last example, we’ll separate the DTD file from the actual document. The document becomes considerably shorter:

  <?xml version="1.0" standalone="no""UTF-8"?>
  <!DOCTYPE SIMPLE SYSTEM "http://127.0.0.1/simple.dtd">
  <SIMPLE><DOCUMENT trackNum="1234">
  <TITLE>Sample Document</TITLE>
  <AUTHOR><FIRSTNAME>Simon</FIRSTNAME> <LASTNAME>St.Laurent</LASTNAME>
  <COMPANY>XML Mania</COMPANY></AUTHOR>
  <SUMMARY>This is an entity inside an
  element:&Description;
  </SUMMARY></DOCUMENT></SIMPLE>

The <!DOCTYPE> declaration now points to a URL—http://127.0.0.1/simple.dtd.

The simple.dtd file contains all the declarations that used to be in the document itself:

  <!ELEMENT DOCUMENT (TITLE,AUTHOR+,SUMMARY*,NOTE?)>
  <!ATTLIST DOCUMENT
       trackNum CDATA #REQUIRED
       secLevel (unclassified|classified) "unclassified">
  <!ELEMENT TITLE (#PCDATA)>
  <!ELEMENT AUTHOR (FIRSTNAME,LASTNAME, (UNIVERSITY
  |COMPANY)?)>
  <!ELEMENT FIRSTNAME (#PCDATA)>
  <!ELEMENT LASTNAME (#PCDATA)>
  <!ELEMENT UNIVERSITY (#PCDATA)>
  <!ELEMENT COMPANY (#PCDATA)>
  <!ELEMENT SUMMARY (#PCDATA)>
  <!ENTITY Description "This is a very simple sample
  document.">

To see the results, run MSXML as usual. It doesn’t display all the declaration information, but as you can see by the expanded entity, it did find the DTD.

  C:\msxml>jview msxml -d -i http://127.0.0.1/simp5.xml
  <?XML VERSION=”1.0” ENCODING=”UTF-8”?>
  <!DOCTYPE SIMPLE SYSTEM “simple.dtd”>
  <SIMPLE>
      <DOCUMENT TRACKNUM=”1234”>
          <TITLE>
              Sample Document
          </TITLE>
          <AUTHOR>
              <FIRSTNAME>
                  Simon
              </FIRSTNAME>
              <LASTNAME>
                  St.Laurent
              </LASTNAME>
              <COMPANY>
                  XML Mania
             </COMPANY>
         </AUTHOR>
         <SUMMARY>
             This is an entity inside an element:This is a  very simple
             sample document.
        </SUMMARY>
    </DOCUMENT>
  </SIMPLE>

Now that we’ve created a workable beginning DTD, let’s examine the parts that go into defining an XML document. The following sections start with the techniques needed to connect an XML document to a DTD and then explore XML data and document structures in greater depth.

How Documents Find Their DTDs: The Prolog

Although not technically a part of a DTD, the opening prolog, which contains the <?XML?> processing instructions and the following document type declarations are the glue that bind DTDs to the code that applies them. These strange-looking new declarations perform some of the functions that the HTML and HEAD elements offer in HTML, but they answer somewhat different questions. They hold only a few pieces of information, all of which are key to telling the browser how to interpret the code that follows. Although the HEAD element could contain interesting information, that information affects only a few specific parts of the presentation, like the title and possibly some scripting information. Specifying what version of HTML was used in a document could be useful for designers or automated HTML editors, but the browser doesn’t really care—it will interpret the code to its own specifications, not those of a committee far away. In XML, the opening tags tell the browser in fairly specific terms how to interpret the document.

Table of Contents