Previous Table of Contents Next


Creating DTDs is a necessary step for building robust applications. DTDs provide critical information that allows XML processors to parse the code and make certain that it contains all the information the application needs, in a form the application is prepared to accept. The DTD provides a critical link between the data files given to the XML processor and the data that are transmitted from the XML processor to the application. DTDs help computers understand structures that may seem obvious to humans.

In this chapter and throughout the book, the focus is on creating DTDs and valid documents. Because XML is so new, there aren’t very many applications for it. Developing real applications that apply XML is beyond the scope of this primer, although several road maps will be presented. The XML examples will demonstrate how to use various parsers—XML processors that can interpret the structure of a document. In Chapter 3, we already saw the Lark parser, a nonvalidating parser that can check a document for well-formedness but doesn’t interpret DTDs. In this chapter, a validating Java parser from Microsoft, MSXML, will be the primary parser used to demonstrate XML code. Other parsers are available, including SP, an SGML parser that can parse XML documents as well.

A brief explanation of the MSXML program is in order before we leap into DTD development. The MSXML program is a free software demonstration available from Microsoft at http://www.microsoft.com/standards/xml/. It can be used as part of a more complete Java parsing program or run from the command line using Microsoft’s jview or a similar program. The MSXML program takes several arguments. The -i argument forces the program to validate the XML code against the DTD. If they don’t match up, the parser returns some reasonably cryptic error messages indicating where the error happened and giving some basic explanations of what went wrong. The -d argument tells the parser to return its interpretation of the document to standard output, displaying it on the screen. The -t argument tells the parser to return its interpretation of the text, minus the markup tags, which it displays at the end, after the output from the -d argument. (An additional argument, -n num, allows developers to tell the parser to run a specified number of times to make it easier to time performance.)

MSXML can be connected to another application, which may then act on the object model MSXML creates from its parsing. Java applications should be able to use MSXML quite easily this way. We’ll discuss this potential further in Chapter 11.

MSXML was built on the August 1997 working draft and may not yet reflect the latest updates to the standard. In Chapter 12, we’ll see a whole new interface for it.

Starting Simple

The details involved in building DTDs can be daunting, even to experienced HTML coders, SQL developers, and C++ and Java programmers. XML has toned down SGML’s dire reputation for complexity, but XML still has many strange detours and odd passageways. The various parts of the XML standard refer to each other constantly, requiring page flipping on an enormous scale. To avoid marching forward into quicksand, we’ll start simple, with lightweight documents that demonstrate some of XML’s power. This first section will use many parts of XML without explaining them in depth; the detailed explanations are in the following sections. Unfortunately, the explanations aren’t likely to make much sense until you’ve seen some of this in action. This brief section is here to present a general idea of the appearance of an XML document, not to explain the details. All the details will appear in later sections of this chapter.

Initially, our examples use an internal DTD. Like style sheets, DTDs can appear in the document they describe or in separate files. Most large-scale projects will use external DTDs stored in central file structures, but this simple document probably won’t be managed. The document begins with the XML declaration, followed by a document type declaration that includes a few elements, attributes, and entities.

  <?xml version="1.0" encoding="UTF-8"?>
  <!DOCTYPE simple [
  <!ELEMENT DOCUMENT (#PCDATA)>
  <!ENTITY Description "This is a very simple
    sample document.">
  ]>
  <DOCUMENT>This is an entity inside an element:&Description; </DOCUMENT>

Running this through the MSXML parser yields the following:

    C:\msxml>jview msxml -t -i -d http://127.0.0.1/simp1.xml
    <?XML VERSION=”1.0” RMD=”INTERNAL” ENCODING=”UTF-8”?>
    <!DOCTYPE SIMPLE [
        <!ENTITY Description ‘This is a very simple
    sample document.’>
        <!ELEMENT DOCUMENT PCDATA>
    ]>
    <DOCUMENT>
        This is an entity inside an element:This is a very
    simple sample document.
    </DOCUMENT>
    This is an entity inside an element:This is a very simple
    sample document.


Previous Table of Contents Next