Previous Table of Contents Next


CHAPTER 8
XML for Document Management

XML promises much more than data interchange between readers and writers or buyers and sellers. XML documents carry with them the information needed to build document systems, organized collections of information that had previously been left to wither away in filing cabinets or trash cans. Traditional file structures and even the Web have provided a minimum level of storage and accessibility, but more comprehensive systems are starting to become standard equipment in offices.

Document management systems store documents, keep track of document contents, control access to them, and allow users to locate key information quickly. Many document systems are really enormous electronic filing cabinets, storing documents with only a few keywords and a date provided for quick searching. Even though full-text search is available (and is indeed an exploding industry), it consumes enormous quantities of computing resources. By giving document management systems a clearer picture of the contents of documents, XML may make it possible for document-management systems to control larger sets of documents more efficiently. Searches can be limited to individual elements, reducing the amount of processing required to get to a document and reducing the number of false matches. Document management systems will need to adapt, although SGML-based systems shouldn’t have too much difficulty moving into the XML market.

A key piece of this document management dream lies in the tools used to create the documents. If the XML tools are as clunky as the hand-coding we’ve done throughout this book, no one will want to use them. Even though XML tools may require significant interface changes, many WYSIWYG tools are already preparing for the transition. Corel’s WordPerfect 8 already includes SGML tools, including a Visual DTD Builder. Several add-on tools are available for Microsoft Word, and Microsoft has announced future support for XML as a Word file format. Although the examples that follow are hand-coded, most of the people using them will not be entering tags directly.

XML allows document management systems to store documents as parts rather than as large clumps of often indecipherable information. Removing formatting information from the core of a document makes it far easier for search engines and similar tools to parse text without having to ponder formatting codes. A document management tool written for XML from the ground up might even store documents as sets of elements within hierarchically organized databases. The object-relational tools available with database systems from Informix, IBM, and oracle allow for the creation of a wide variety of data types, some of which are rich enough to store XML documents in a manner reflecting the structure of the document—a set of small pieces that can be manipulated, rather than a chunk of text that requires a full parsing every time it is accessed.

Building document management applications is well beyond the scope of this book. The remainder of this chapter will explore ways to create DTDs that consider real business needs, creating centrally stored documents that can be easily searched and that meet the needs of more than one part of a company. The first example standardizes the memo, perhaps the most commonly used business document type of all. The second creates a custom solution to a problem common in larger companies, that of keeping track of completed projects.

Small Steps Toward the Paperless Office

Our first management DTD will address one of the largest paper-wasters in business environments: the memos that perpetually fill in-boxes. Many companies produce small weekly newsletters in a memo format, and this DTD will disseminate chatty pieces of information as well as the boss’ announcement that the company is cutting off the supply of free donuts. Although many people might question the wisdom of saving and managing memos, memos and other small-scale communications have grown dramatically in importance with the rise of litigation and the need to document processes. The Freedom of Information Act (FOIA), for example, requires that the federal government must maintain records of its activities and release them (in some form) to the public. At present, an FOIA request can take weeks or months as agencies contact their warehouses to gather old files. With a system like this, the time needed to locate documents could be greatly reduced. This DTD can be reused easily for a number of other tasks (e.g., e-mail is usually formatted on a similar model).

Virtually no one will want to hand-code their memos in XML. In the case of the memo, with its very simple structure, a program might even be able to read the memo DTD and use it as a template. XML parsers can use the information in a wide variety of ways, not just as document presentation information. An advanced XML processor might create the memo through an interview process rather than the usual clicking in fields in a document.

The first step in creating the memo DTD is interviewing people and collecting memos—lots of them—to examine how they are assembled. Most companies use a fairly standard format, with a letterhead of some kind at the top, followed by a distribution list, the source of the memo, a brief headline, and then the contents. In some cases, the typist is indicated at the bottom of the memo if the typist was someone other than the original author. For our example, we’ll use a very imaginary company—Jimmy’s Delectable Car Parts Design (JDCPD). JDCPD is a successful firm that sells after-market high-performance parts for all kinds of cars and trucks. A typical memo might look like that shown in Figure 8.1.


Figure 8.1  A typical memo.

Some memos are more complex. Jimmy’s Delectable Car Parts Design has a public relations office, which also puts out an internal weekly newsletter. The newsletter has short items of interest to JDCPD employees, presented in a friendly, informal style.


Figure 8.2  A more complex memo.

The public relations department would like to be able to use the memo format for other presentations as well, although they haven’t planned anything specific quite yet. They know that in future editions of the newsletter, especially the upcoming Intranet newsletter, they would like to include thumbnails of the award winning drawings and dress up the page a bit with more logos. Press releases are also distributed in a similar format, although they probably won’t be included in this project.


Previous Table of Contents Next