Previous Table of Contents Next


You do this for a number of reasons. In this case, it serves as a shorthand method of using a long string. This method can also be used to include document components that are external to this DTD (you learn more about this in Chapter 11, “Using DTD Components”). It’s also useful for including special characters such as scientific symbols, foreign language characters, and other symbols that aren’t readily accessible from a standard keyboard.

You don’t have to structure the document this way. The document definition is there to define the document according to what makes sense to you, the document architect. How you define a DTD to meet the needs of your situation may be quite different from the way I do. The true test of the DTD is how well it meets your needs, not how it compares to someone else’s.

Parsing Document Type Definitions (DTDs)

Before you can start using your new DTD, you must run it by a review committee of sorts, a validating parser. As you learned in Chapter 4, the validating parser is used in the SGML environment to confirm that an SGML document conforms to its corresponding DTD.

However, before you can validate SGML documents against a DTD, you must confirm that the DTD itself is legitimate. This means that the DTD must be checked for internal consistency in its document definition scheme. The syntax of the DTD must be checked and confirmed to be in accordance to the rules. It must also be confirmed that it is not ambiguous; it must be explicitly clear where tagged elements may occur. (SGML does not allow ambiguous content models!)


• See “Tags and Tag Minimization with Omittag,” p. 186


Note:  
An ambiguous content model is evident when an element or character string can fit into several possible locations within the current content model. (It is unclear as to which possible location in the content model it belongs.)


Note:  
The SGML prohibition against ambiguous content models is for your own protection. By outlawing ambiguous content models, SGML ensures that your document is organized as you intended it, not as some unforeseen variation of what you were intending.

DTDs must be parsed when they are created and each time they are modified to confirm their conformance to the SGML standard.

Document Maintenance Considerations

When you create an SGML environment from scratch, there are a few issues that need special attention. If you are starting out in the SGML world, expect a few bumps in the road. Your first DTDs will likely be a little “off,” especially if you’re modeling complex documents. As a result, you’ll probably have to tinker with them a few times to get them right.

If your organization is composed of a number of writers who will be using your DTDs, you’ll want to spend a fair amount of effort confirming that they’re right. After all, you’ll probably be educating your writers about SGML, structured authoring, DTDs, and a host of other topics simultaneously. Constantly changing the “rules” (as defined in your DTDs) while your writers are getting comfortable in an SGML authoring environment will probably not help the learning process!

Constant changes in your document architecture (as defined in the DTDs) can also play havoc with your existing SGML document collections. After all, these existing documents will also have to fit the modified DTD.

The readers of your SGML document library also play into this scenario. Significant changes to your DTD will also have to reach your readers, wherever they may be. If the content model changes significantly, it will affect the readers using all of the previous versions of your documents! After all, they might be rather unhappy if their existing library of documents suddenly becomes unreadable.


Tip:  
Whenever possible, consider making any necessary changes to your DTD in ways that make it broader in scope than its predecessor. When this can be done, problems with existing legacy data compatibility can be minimized.

For example, if an element named catalog currently contained the following:

    catalog (id,subject,location)

and it was modified so that subject was replaced by topic:

    catalog (id,topic,location)

the following definition would ensure that both versions were still legal, thus accommodating old and new documents.

    catalog (id,(topic|subject),location)

Converting Existing Documents into SGML (Filtering)

Creating your SGML environment by converting existing documents has some similarities to creating an SGML environment from scratch. Yet it differs in that you have a few more issues to consider. The nature of your existing documents, including their structure, variability, and complexity, all factor into your conversion strategy.

The native format of your source documents can be a significant issue. Is this format easy to work with? Can your conversion tools easily handle the format? (If not, are there any intermediate formats that you can readily convert that are easier to deal with?)

In some situations, the conversion process may involve a number of conversion steps requiring specialized expertise (such as in the conversion from paper or microfilm-based source documents). For complex conversions (or if you lack the resources in your own organization), a document conversion company qualified to do SGML conversions may be your best alternative.

The tools that are at your disposal for the conversion process are also important. Are they easy to use? Can they be adjusted and modified to support your changing needs? Do they support the file formats from which you will be converting? Will they support other formats that you might be using later? These and other issues figure into the selection process as you build your toolkit.


Note:  
For programmers, one of the handiest and most flexible conversion tools is available for free! Perl, a pattern recognition and text manipulation language, is widely available for a variety of computer platforms. PerlSGML is a publicly available enhancement package that adds specific SGML features to the basic Perl package.


• See “The World of Perl,” p. 491


Previous Table of Contents Next