XML: A Primer:Processing XML: Applications, Servers, Browsers

Extending the Server

Although Web servers have grown considerably more ornate since the Web first appeared, the basic concepts underlying Web page distribution remain quite simple. A Web server receives information over a network from a Web browser, interprets that information, and sends back a reply. In most cases, the browser is just requesting a particular file, which the server locates in its file structure and sends back to the browser with appropriate header information that informs the browser of what it’s getting. Although servers have sprouted extensions that allow these files to be generated by database interfaces, parsed includes, scripts, and other processors, the venerable HyperText Transfer Protocol (HTTP) still carries a high proportion of traffic that is generated only by simple file requests. Other protocols, like those that allow Java applets to communicate with the server directly, are growing, but simple file requests remain a key part of the basic Web infrastructure.

XML threatens to break this fundamental file request structure by making much heavier demands for files in order to fulfill its validation and linking requirements. Although DTDs and style sheets may not hit the ceiling of the current capabilities of file-based Web servers (they can be easily cached and used for multiple documents in most cases), the new linking features may. The new EMBED functionality will encourage developers to create documents that include parts of other documents. XPointers make selecting chunks of documents easy. If all the XPointer processing takes place on the client, servers will spend considerable amounts of effort sending files to clients that can be used only in part. Documents that link to multiple documents this way could increase the load dramatically—especially if the processing applications are other programs voraciously seeking out key bits of information. While extended link groups may provide delightful functionality to the client by creating true multidirectional links, they promise an enormous traffic jam at the server, especially if developers keep their links distributed across multiple files rather than consolidating them in centralized link clearinghouse files.

The solution to all these problems is simple, even though it will take considerable work to implement: servers need to able to distribute chunks as well as files. XPointers provide one syntax for specifying chunks of data that can be passed to the server as a URL. If the server interprets the XPointers before returning the data, the transmission of a great deal of unnecessary information may be ignored. Extended link groups will work well in this structure, taking advantage of the ALL value for the instance argument. Using these structures, the client can send the server a very precise description of exactly the document parts it needs.

Creating servers that can efficiently handle these requests will require a significant change in the way documents are stored. Traditional file systems keep XML serially (e.g., retrieving the content of the second-last element of a document requires retrieving the entire document, parsing it, and extracting the second-last element). This produces enough overhead that the administrators of many busy servers might prefer to just let the client application handle the processing—until, of course, they run short of bandwidth because their server is transmitting excessive amounts of lightly used information.

Fortunately, a more efficient solution has recently reached commercial viability, with Informix, IBM, and others offering products. Although still complex, carrying a steep learning curve, object-relational databases are capable of handling precisely these kinds of requests efficiently. Object-relational databases provide hierarchical structures (i.e., they correspond neatly to XML’s nested elements) that can be retrieved, searched, and processed quite easily. If you haven’t worked with object-relational databases before, it’s probably not worth your effort to run out and buy one for your server. It will take much coding to make the translations between XML documents and the database smooth, and it’s probably a task better left to vendors. XPointer-enabled servers that can process requests for chunks efficiently are probably not too far off. Several firms in the SGML and HTML worlds already provide object-relational tools. Inso, a participant in the XSL proposal and DOM working group, uses an object-relational database as the foundation for its DynaBase HTML site management tool, for example.

Pure object databases, like POET and Jasmine, offer developers more appropriate—though perhaps more difficult—object databases.

Table of Contents