Previous Table of Contents Next


Document Analysis

The issues faced in document analysis when converting documents are somewhat different than those related to documents created in a structured SGML authoring environment. The latter documents have the benefit of having structure included (and hopefully, verified) during the authoring process.

When converting documents, you must take into careful consideration the nature of your source data (which you’ll be converting into SGML). Its nature and characteristics can have a great influence on how you develop your DTDs.

Table 5.2 describes some of the issues you’ll need to examine in your existing documents, and with the people and programs that create them.

Table 5.2 Source Document Considerations When Filtering into SGML
Issue Considerations
File Format Do your tools directly support this format? Are there any parts of this format that cause problems (such as table definitions, equation support, or embedded graphics)? How much structure is embedded in the file format?
Authoring Package How sophisticated is the authoring package/program? Does it support structure in a document? If so, how? How does it handle text formatting? Can it export formatted documents in a non-binary ASCII interchange format, such as Rich Text Format (RTF)?
Document Structure How consistently are your documents structured? Is the structure “tight” or “loose”? (That is, is a highly rigid and defined structure tightly enforced?) Is the structure simple or complex? Do the authors use a number of approaches to achieve the same look (or visual structure)? Are the authors consistent within a document in presenting the same data structures in different places?
Document Authors How flexible are they? (Can they modify their authoring approach if necessary?) Do they write in a structure that is logically consistent? Do they understand the concepts of SGML? If not, are they willing to learn? What is their attitude toward SGML? (Do they feel threatened by the use of SGML?)

Once you understand the nature of your legacy (or source) data, you can focus on defining your goals. This process is much the same as when you are starting from scratch. The process that you went through earlier for AnyCorp could just as well apply to the conversion approach.

Document Modeling and DTD Design

Developing a document model with a corresponding DTD design can be straightforward in the conversion approach. After all, you have a source document from which to work. Usually, you can visually analyze a document and rather easily define the model.

Once you have defined the document model for your source document, you should look it over, study it, and decide if this model meets your needs for your output SGML document. In other words, is the existing model sufficient to meet your goals? If it is, then it’s a rather simple process to define it according to SGML syntax in its own Document Type Definition.

If the document model doesn’t meet your needs, you need to figure out why. Common issues to address might include:

  Changes in the presentation medium of the document, such as going from printed documents to electronic delivery
  The desire for document enhancements, such as adding hypertext links to other document sections (and other documents)
  Changes to the document usage or audience
  The desire to support the use of modular document components (both within this document and across a range of documents)

Quite often, you see organizations moving to SGML in order to make their documents more transportable. This can be as simple as moving between word processing systems or as complex as gaining the ability to present data in both printed and electronic environments.

When moving to the electronic environment, it is useful to consider several generations of your (future) SGML documents when developing your DTD. For example, your first version may be a simple electronic equivalent of your original paper-based document. The second version may include hypertext links between sections of your documents. Version three may have links between documents in your collection, as well as multimedia objects (like sound and video clips).

If you can anticipate the progression in the beginning (to some degree, at least), you can build many of the structural links to support these enhancements into your DTD in the beginning.

Document Conversion

Due to the wide range of possibilities involved with converting documents into SGML, a thorough discussion of this subject is beyond the scope of this book. Software packages available to perform these conversions range in price from free (in the case of Perl) to many thousands of dollars.

Quite often, your best bet may be to contract the conversion of a large amount of legacy data to a document conversion company that is experienced in SGML conversions.


Previous Table of Contents Next