XML: A Primer:Processing XML: Applications, Servers, Browsers

Architecture for XML Processing Applications

Designing an XML application requires far more than coding a DTD, borrowing a generic parser, and writing some code that interprets documents. XML is one piece in a very complex environment that requires analysis of authoring, validation, storage, transmission, parsing, processing, and rendering. Not every application will involve all those parts, but most large applications will need to include all of them, as shown in Figure 11.1.

Figure 11.1 The life of an XML document.

Validation (or at least well-formedness checks) may take place at nearly any point in the process. Some developers may want to validate documents as they’re being authored, others may wait until they’ve been stored, allowing bulk processing, and others may validate at the receiving processor. A few may even validate at all of these steps. Given the complications that linking can introduce, validating at more than one stage in the processing of an XML documents may be a good idea, or even a requirement in some cases.

Authoring tools are as much a part of efficient XML development as parsers. As we’ve already discussed in several previous examples, hand-coding XML is not fun for most people. Finding or building useful XML authoring tools is critical to the success of any project that involves more than a few die-hard coders. SGML and HTML tools are available, and XML tools are on the horizon. Even though word processors have demonstrated their flexibility as general-purpose creators of a wide variety of documents over the last few years, XML developers may want to consider turning against this general-purpose model and focus on creating applications that zero in on particular documents. A well-written DTD can provide the basis for an interview-driven application, which walks authors through the required elements and doesn’t let them complete their documents until all required parts are present. Effectively, this kind of authoring tool performs validation even while the document is being written, ensuring that documents are complete.

Storage and transmission are two more fronts that XML may transform. XML can be stored easily as files, but its structure also lends itself to storage as elements or chunks. Storage facilities that break XML into components smaller than files can perform validation as the document enters and exits storage, providing another layer of security that documents are properly constructed and allowing users and programs to request smaller chunks of documents. As we’ll see later, this may change the architecture of the familiar Web server.

Parsing, of course, is at the core of any XML application. In Figure 11.1, the parsing being performed is on the client side, interpreting the file and preparing it for processing. Parsing can in fact take place at any level of this structure, although client applications will probably continue to need text file parsers for a long time. (Files will probably remain a key unit of transmission for a number of years to come, even if improvements in standards reduce the size of those files.)

The processing application is the recipient of all this effort. We’ve explored several different operations that could be performed here. The processing application could just be passing information to a rendering engine, the last stage shown in Figure 11.1, which will present the information in some format (screen, paper, CD-ROM, etc.). The processing application could also be a search application used to retrieve document information for storage in a database, a statistical package converting document data into mathematical results, a data mining application searching for trends in stored documents, or even a document checker (which could check for spelling, style, or anything else a computer can interpret) that can examine documents, flag problems, and return them to the server for intervention. The processing application more or less generates the results of all this work, whereas the rendering engine does its best to present those results in an acceptable form.

Now that we’ve examined all these steps, we’ll look more closely at the implications they hold for the two most ubiquitous Web tools—servers and browsers.

Table of Contents