Previous Table of Contents Next


Tricks, Traps, and Pitfalls

Earlier in this chapter, you explored two approaches to setting up an SGML environment: authoring in SGML from the beginning, and filtering existing documents into SGML. Each approach has its pros and cons.

If you’re undecided on which approach to take (structured authoring versus conversion), you might want to think of it as a “You can pay me now, or you can pay me later” issue.

With structured authoring, you must contend with all the restrictions that are imposed by the document structure defined in the DTD. You might find it slow going at first, as you figure out what you can do at any spot in the document. The learning curve can be steep in the beginning. You might discover you don’t like operating within the confines of a set of rules policed by a machine.

On the plus side, those rules ensure that your documents fit the architecture that you have defined for them. A consistent set of documents is much more manageable across different software packages and delivery environments.

The document conversion approach permits you to put off some of the difficulties in arriving at consistently structured documents until later, at the conversion step. Rest assured that if you have wildly varying documents (by wildly varying authors), the conversion process will be an adventure.

The document conversion process can be tedious and frustrating at times. It requires an eye for detail, particularly on large, complex conversions. Getting good at it may take some time.


Tip:  
If you’re going the conversion route and the people doing the conversion are not the document authors, make sure that you implement “feedback loops.” Through this approach, the conversion people can let the authors know of ongoing problems with input documents (such as writing styles and techniques) that commonly cause conversion errors. If the document converters are constantly fixing the same types of errors, they may become morose, moody, or withdrawn! If they have the chance to improve the process, their outlook may remain bright.

When parsing document instances on an ongoing basis, note the types and frequency of errors that you encounter. Do a little analysis on them. Does one particular author generate a large percentage of the errors? If so, can you figure out why? I once worked in an environment where one author producing five percent of the documents also produced 80 percent of the parsing errors. After I noticed this, I did a little checking. I found out that he had been away for a three-week period during which everyone else was instructed on the issues involved with authoring for SGML. After some quick refreshers on the topics he had missed, the error rate on his documents was similar to those of his fellow writers.

You’re fortunate to have a choice today. Until recently, the options available to turn your words into SGML documents were much more limited. With the advent of ever more tools, you can pick and choose the tool that bests fits your needs.

If you choose to create SGML via the route of structured authoring tools that only allow valid SGML components in your document (as defined by your DTD), the need for parsing each document instance can be reduced (or possibly eliminated). But beware! Some structured authoring tools have a feature that allows you to “turn off” the validating feature that ensures your compliance with your DTD. If your authoring package is one of these (and you avail yourself to this “feature”), you must treat your documents as if they were filtered into SGML. That means you need to parse them against your DTD! If, for any other reason, you have doubts about the validity of your document instances, parse them.

Even if your DTD modification is incredibly minor, parse it after the change. After all, sometimes the smallest errors are the hardest to spot, particularly among all of those tags.

Re-examine your goals, content models, and DTDs from time to time. Although you don’t want to be constantly changing your DTDs, it is an evolutionary learning process. Sooner or later, you’ll want to make a few enhancements. When you do, make sure that you have a system for tracking versions! As with software, there’s nothing worse that having three or four versions of the same DTD and not being sure which version applies to which documents.

Most of all, involve people in the process! All too often, SGML is introduced into an organization without a reasonable amount of discussion and explanation. More than once, I’ve been in departments where people have felt that it was being introduced in order to “get rid of the people and replace them with computers.” That may occasionally be the motive, but I’ve yet to experience it. More commonly, the intent is to use the information more flexibly while automating the more mundane and tedious parts of the process.

From Here…

This chapter explored the issues involved with implementing an SGML system in two different ways: from the ground up with all new documents produced directly into SGML and through the process of converting existing documents into SGML. With these two situations, you have some issues that are the same, and others that differ.

This examination of SGML parsers looked at how the parsing step ensures the consistency of both Document Type Definitions and individual SGML document instances.

For more information, refer to the following:

  Part II, “Document Analysis,” looks at the issues involved with mapping your documents in SGML in more detail.
  Chapter 10, “Following General SGML Declaration Syntax,” tells you more about the SGML syntax.
  Chapter 23, “Rapid Development and Prototyping,” examines the issues involved with rapidly developing an initial SGML system in your organization.


Previous Table of Contents Next