Previous Table of Contents Next


Document Management

Most documents any longer than a business letter don’t exist in only one form. They get re-edited, expanded, or even chopped up and included in other documents. As this happens, many problems arise. On any system that supports cross-references or hyperlinks to documents, a big problem comes up right away: What should the reference point to after the original document changes?

Let’s say Mr. Smith writes an article and publishes it. A few months later, Ms. Jones reads it, finds a big flaw, and publishes a review criticizing it. Mr. Smith reads the review, realizes Ms. Jones is right, and re-writes his article fixing the problem. So far, nothing special; this happens all the time.

But what happens when some third person comes along later, and the first thing he runs into is the review? If this is all online, he’ll probably follow a link that Ms. Jones made, that pointed to the worst part of Mr. Smith’s article. Should he see the old version, or the new?

If he sees the old version, he gets the correct impression that Ms. Jones found a real problem, but Mr. Smith may be upset because even though he really did fix the problem since then, the later reader doesn’t find that out. On the other hand, if the reader sees only the new version, it’s a bit unfair to Ms. Jones, since it will look like her review was making up errors to complain about.

Doug Engelbart’s “Augment” system dealt with this as far back as the mid-60s (he invented the computer mouse, multi-window displays, the outline processor, video-teleconferencing, and a few other things). In Augment, any time you made a document public a copy of that version was locked and kept around permanently. So links would always point to the same thing. The really nice touch, though, was that the system carefully kept track of which document and version was which, so when you followed such a link it could tell you immediately that a newer version(s) of the document was also around.

This seems fair to all concerned, and a system like that would help us a lot today. On the Web, documents change all the time and sometimes just disappear, all with no notice. Several companies are working to solve this problem by introducing Web site management tools. Object-oriented databases are being used very effectively in building this kind of tool.

Such tools keep track of versions of documents, and you can ask them what happened to a given version. The better ones will let you recover a version by date or name, and may even be able to say how it differs from some other version that was published at another time. These tools can also look around whenever you change a document, and warn you about particular links that are affected (either ones that break outright, or ones that might not be meaningful anymore, as in Mr. Smith’s case).

No high-end Web-specific management tools are available yet. However, a couple have been announced, and a few sites are experimenting with similar tools that have been around for other purposes, such as RCS for managing computer program source code files. This area is expected to become very hot in the next several months. A good solution to the problems that arise when documents change and links break would be a huge step forward.

From Here…

There are many issues to consider as you prepare to supply SGML data via the Internet. What DTD to use; how, when, and whether to convert your documents to HTML; and what servers and data management tools to use. This chapter covered those issues and tradeoffs, and provided some key points to ask about when planning a Web-based SGML delivery system. One big plus for using SGML is the potential to use the same documents for a variety of purposes: Web delivery, CD-ROM or local-area network delivery, print production, information retrieval, and more. Many sites have done this successfully already.

For more information, refer to the following:

  Chapter 20, “Practicalities of Working with SGML on the Web,” covers some of the tools you might need in more detail.
  Chapter 22, “Developing for the World Wide Web,” covers several hot current topics, such as how the MIME standards are being enhanced to support SGML, some important related standards such as HyTime, and some tips to help you get professional-looking results.
  Appendix B, “Finding Sources for SGML Know-How,” provides a variety of useful resources, such as books and Web sites.


Previous Table of Contents Next