Previous Table of Contents Next


Manual Markup

Manual markup is fine if you don’t have many documents or suffer from time constraints and boredom. It can get tedious manually inserting tags into a long document with a complex document structure. This approach, however, offers the greatest flexibility for markup.

SGML files are simple ASCII text files with an *.SGML or *.SGM file extension. You can view them with any text viewer, but an SGML processing system is required to interpret the tags and build the document those tags define.


Tip:  
Use a simple text editor that saves the file without formatting. If you save the file in a proprietary format, such as a WordPerfect or Word file, you limit yourself. Eventually, you will have to resave the file as an SGML file anyway. In other words, don’t use a tool that embeds special codes.

If flexibility is a high priority for you and you don’t have many complex documents, manual markup is an option. The useful thing about inserting tags manually is you can put them anywhere. That does not mean that your document instance will parse, however, but you can parse separately. You also do not have to buy expensive SGML software.

There are trade-offs, of course. Because you can put tags wherever you want, you can make more errors than you would had you used an SGML structured authoring tool. For example, if your DTD calls for a <FIGURE> element to be used only within a <PARA> element, the structured authoring tool will not let you stick it in a <HEAD1> element. A manual text editor doesn’t care where you put any tags. Although the structured authoring tool can be frustrating when you author a document, it ensures that it will parse according to your DTD. A manual text editor gives no such assurance.

The flexibility that comes from manual markup is not always good. Because it’s easy to manually markup small HTML documents with non-SGML element structures, you can run into a variety of problems. Certain browsers, for example, support non-standard extensions to the HTML DTD. These extensions cause problems on the World Wide Web, such as:

  They make the document an invalid HTML document
  They make the document somewhat less transportable and accessible from other browsers
  They emphasize appearance and formatting over content and structure
  They short-circuit the process of information sharing


Note:  
The short-circuit referred to above happens because some browser developers want their product to be the most popular, so they make theirs “better” than the standard browser by anticipating improvements to the HTML DTD. But by supporting non-standard extensions, they encourage their customers to develop Web pages that not everyone can read. Their customers are sharing attractive graphics, but those graphics can be viewed by fewer people because they’re non-standard. A better approach is to extend the standard, even though this does take time.

When you use a text editor to insert tags manually, you open the door to this haphazard approach to document creation. You should always remember to validate your documents.

SGML, as the international standard, enables you to load DTDs as needed without violating the HTML 2.0 standard. Therefore, you can use the Netscape extensions—as well as many others—without compromising the HTML DTD. Until Panorama is as popular as Netscape, however, you must be disciplined when you manually markup an HTML document. Figure 15.1 shows a non-standard <BLINK> element.


Fig. 15.1  You can create this <BLINK> element using a text editor, even though it does not parse.

Document Conversion

There are many kinds of document conversion tools. This approach to creating SGML documents is useful when you want to upgrade HTML documents to other types of DTDs. Automatic tagging approaches such as this one enable you to convert documents to a neutral file type first, and then to SGML.


• See “Avalanche/Interleaf: FastTAG,” p. 495

Suppose, for example, that you have many documents in a proprietary word processing format called WordWiz. No SGML conversion tool exists for that file format. You must first convert the documents to a neutral markup scheme, such as RTF, and then convert them to SGML. Figure 15.2 shows an example of this chapter converted into RTF and then into HTML.


Fig. 15.2  The program RTF2HTML converts RTF documents into HTML documents.

Before you seriously consider automated document conversion, you must have a consistent document structure. You’ll need to have a specific translation scheme for each type of document. To create any SGML document type, you need document analysis. Suppose, for example, that some of your memos have a return address paragraph; other ones do not. Your DTD for the memo document type calls for a return address. To convert your memos into SGML, you must first add a return address to every instance that does not already have one. This is because when the SGML parser goes to each instance of a memo, it expects to see the return address as specified in the DTD. If it does not find one, the document fails to parse.

In short, document analysis is crucial, even when documents are already created. You must look at your legacy documents and go through the document analysis steps discussed in Part II, “Document Analysis.” These steps are:

  Define the environment
  Define the elements
  Relate the elements to one another
  Extend the document architecture


• See “Document Analysis,” p. 97

You must review these steps for document conversion to SGML. You probably need to convert documents gathered from elsewhere in your SGML environment as well.


Previous Table of Contents Next