XML: A Primer:Processing XML: Applications, Servers, Browsers

CHAPTER 11
Processing XML: Applications, Servers, Browsers

Even though most of this book covered how to write XML documents and create XML DTDs, creating documents and DTDs is only a part of what developers need to do to take full advantage of XML. All the examples so far have, to some extent, lamented the lack of processing applications. A full-scale treatment of parsing and processing applications would take another book (or, more likely, a set of books), but a basic understanding of how these applications will work is critical to creating usable XML. Many of the teams working on XML development will have separate groups building document types and creating processor applications because the two kinds of work demand different skillsets. Still, both groups need to share a common vocabulary. In this chapter, we’ll examine the new vocabularies and architectures that XML is creating and some of the implications of XML for data processing.

Programming for XML

Although we’ve taken as much advantage as possible of XML’s ability to keep documents human-readable, the real reasons that make XML exciting have to do with machine-readability. Markup is designed to be easy to program, using a nested structure that works well with both recursive functions and object-oriented programming. Although parsing valid XML documents is not a light task, neither does it present the enormous challenges faced by programs that must parse other formats.

The developers of the XML specification have made XML programming much easier by tightening the rules for XML document structure syntax, while at the same time loosening many of the constraints by adding the well-formedness option. XML’s firm requirement that all elements have complete start- and end-tags, or indicate that they are empty by closing the tag with />, makes it far easier to write a parser. Both SGML and HTML allowed elements to skip the end tags, which required significant code effort to determine where exactly the end of an element was supposed to be. SGML also allowed abbreviated element names, adding an extra level of lookup to the parsing process. XML’s basic structures, expressed in the criteria for well-formed documents, ensure that parsers can be reasonably simple programs that won’t add incredible amounts of processing overhead to document processing.

Validating XML documents, as opposed to checking them for well-formedness, remains something of a challenge, thanks to parameter entities and the need to check element structures against the DTD. DTDs can be incredibly complex documents to interpret, especially DTDs that extend back through several files because of multiple parameter entities and DOCTYPE declarations. Applying large DTDs to small files can waste processor cycles while the parser interprets extra information it will never apply and adds overhead to every element lookup. Still, validating documents is a critical part of XML development. As we’ll see later, validation may occur at several different points in the lifetime of a document, from its initial construction to its final presentation.

Many XML applications are probably going to end up using parts of XML, creating parsers that straddle the well-formed and the valid. Applications that can handle XML linking will probably need to do some validating, unless programmers want to present the attributes needed to create links in every single element instance. Documents that use entities extensively might not need a DTD that defines their elements and attributes, but they do need a parser that can expand their entity references. How far practice will diverge from the twin standards of valid and well-formed remains to be seen, but more levels are likely to appear. Murmurings have appeared in XML-DEV, the XML development mailing list, of situations for which full DTDs may not be appropriate. Combining the extreme flexibility of well-formed documents with the more powerful tools available in valid documents will likely cause some problems.

The XML-DEV mailing list is a key forum for developers creating parsers and other XML applications. The archives for the list and information on joining are available at http://www.lists.ic.ac.uk/hypermail/xml-dev/. When communicating on this list, keep in mind that it is a mailing list aimed at high-level development, not XML tutorials.

Table of Contents

CHAPTER 11Processing XML: Applications, Servers, Browsers

Programming for XML

CHAPTER 11
Processing XML: Applications, Servers, Browsers