Special Edition Using SGML:The Basic Procedure

Defining Output Specifications

An output spec defines how your document looks and through what medium it’s presented.

For example, suppose you have a database or archive of product information files. You’d like it to be downloadable by people from your company’s electronic bulletin board system, as well as viewable by people who visit your Web site. You’d also like to be able to print copies of the product information files from your network printers by using an SGML browser application. And, you’d like people on your network to be able to load the files into their word processing documents as embedded objects. How those files look in each context is determined by your output specification.

Unfortunately, output specifications are a difficult and very specialized topic in SGML. The problem is that with output specifications, you move away from dealing with document structure and toward document format. Format and appearance controls will vary from machine to machine, so it’s very difficult to provide a single specification that is usable on any type of processing system. There has been a project underway for several years to provide a Document Style Semantics Specification Language (DSSSL), but as of today it is still not complete.

• See “The View of a Document from an Output Perspective,” p. 408
• See “Difficulties with Output Specifications,” p. 419

• See “Output Specification Standards,” p. 419

Just as application programmers must provide drivers (special hardware-specific programs for controlling hardware devices) for many popular printers so character fonts will print the same way on all printers, so must output specification designers make their formatting instructions comprehensible to all types of processing systems. It’s tougher for output spec designers, however. When you buy WordPerfect in English, for example, the odds are that you have one of the popular American printers. But when you design an output spec for a document, that specification may need to be processed anywhere in the world on any type of machine, not just one that’s popular in English-speaking countries. The DSSSL needs to be a universal language.

There is something called DSSSL Lite that is a subset of a universal language for output specifications. This is more of a draft specification, and it is not intended to meet the needs of all designers or processing systems, but it does provide a starting point for developing usable output specifications. For more information about the latest changes proposed to the DSSSL Lite, point your Web browser to:

http://www.sil.org/sgml/related.html#dsssl

You may be able to use an existing output specification. There are plenty in the public domain. You may be able to modify one for your purposes and make it conform to your standards.

Incorporating Document Markup

When you mark up a document, you are moving from the design phase to the production phase of the basic SGML procedure. Incorporating document markup involves putting tags into the individual document instances. How you do this depends on your situation. Do you have a lot of documents that have already been authored and now need to be tagged, or do you have a host of new documents that you want to author in SGML?

If your documents are already created, you have to convert them into SGML documents, using a conversion utility, such as Earl Hood’s SGML extension to Perl. If the documents aren’t created yet, you have the option of choosing an SGML author/editing tool to create them. This is called structured authoring.

• See “The World of Perl,” p. 491
• See Chapter 5, “Two Scenarios,” p. 87

There are basically four types of tag insertion tools to use with SGML:

• Autotaggers (automated tagging tools)

• Transformers (conversion tools)

• Text editors (authoring and editing tools)

• Hybrid tools (combination of some of the above tools)

Document Parsing

Parsing document instances is easier than parsing DTDs. The main challenge of parsing document instances is handling documents that have specialized content. Even though parsing documents is usually a routine production task, specialized content can turn your documents into small nightmares.

Two of the more difficult examples of specialized content are tables and equations. Since tables are a delicate mix between format and structure in SGML, it is often best to parse them separately from the rest of your documents. Likewise, you may wish to parse equation-laden documents separately from simpler document instances. Since documents with specialized content can cause challenging parsing errors, it’s probably best to not let them slow down parsing for the rest of your documents.

• See “Handling Tables” and “Handling Math and Equations,” pp. 421, 424

Working with Consultants

To optimize your time with consultants, consider your work in two parts—the preparation work before they arrive and the actual consulting work while they’re with you.

Table of Contents