Previous Table of Contents Next


Specific conversion programs have different trade-offs. Some don’t really suffer much from these weaknesses. Keep them in mind when you consider how to convert word processing files to SGML.


Note:  
The key to making this approach work is to create consistently structured documents from within the word processor. That way, when you use a utility, such as FastTAG from Avalanche, to convert them, you will not have as many parsing errors. The danger with structured authoring in a word processing environment is that it enables you to do things that you should not. A true SGML structured authoring tool never lets you get close to mischief—or, at least, makes it very difficult.

Suppose, for example, that according to your DTD the <NOTE> element can occur only within a <PARA> element. A word processor does not know the difference, and it lets you put it anywhere. A good SGML structured authoring tool, on the other hand, won’t let you put the <NOTE> element where it doesn’t belong.


Conversion Tools for Intermediate File Types

These tools convert an intermediate file type, such as Rich Text Format (RTF), into an SGML document instance for a particular DTD. This approach succeeds where word processing file conversion fails because not everyone uses a popular word processing program that has extensive SGML support.

What if your platform does not support WordPerfect, Word, or Interleaf? What if you have to do a project on an old mainframe or minicomputer that supports only a limited range of text editors and file formats? You can probably create a file type for which a conversion utility already exists. Because you can do this, you have a solution.

There are many intermediate file types. In fact, there are transformer utilities that enable you to convert from various SGML DTDs to other SGML DTDs. There are tools that convert a document and automatically generate a DTD. These include:

  Tools that convert from RTF, TeX, or a generic file type to a document type instance for a single DTD
  Tools that convert a document instance of one DTD into a document instance of another DTD, such as AAP2ISO and DTD2HTML
  Tools that convert highly individualized file formats into a highly individualized document instance for a DTD, such as i2c for ISO/CALS table conversions and SGML Exportfilter for FrameBuilder
  Hybrid tools, such as SGML Hammer and DynaTag

Some tools have been used extensively, and you can rely on them. No matter what platform you’re used to working on, you can find a conversion utility for it. A good place to start looking is Robin Cover’s SGML repository of tools on the sil.org SGML home page:

http://www.sil.org/sgml/sgml.html

You should also check out his public domain tools link at:

http://www.sil.org/sgml/publicSW.html

Steve Pepper’s Whirlwind Guide to SGML tools is another good source of SGML conversion tools:

http://www.falch.no/~pepper/sgmltool/

The following FTP repository has SGML conversion tools as well:

http://www.w3.org/hypertext/WWW/Tools/Word_proc_filters.html

One popular converter is RTF2RB, shown in figure 15.4. The only requirement is that you can convert your file types into RTF. If you can, you can convert the file into the Rainbow DTD in SGML. The Rainbow DTD is highly flexible and is designed to accommodate many file types. Once you translate a file into this DTD, you can convert it into another document instance for another DTD.


Fig. 15.4  RTF2RB is a popular document converter utility that exists in the public domain.

You still have to pay attention to document analysis with this approach, of course. A wide variety of utilities convert first to an intermediate file type, so it is impossible to specify the requirements for each one. In general:

  Have a specific DTD in mind when you convert to SGML.
  Be aware of the differences between strains of intermediate file types; for example, ASCII and RTF can exist in several strains that might affect the conversion
  When you convert from HTML to another DTD, make sure that the HTML document parses completely; many non-standard HTML documents take advantage of browsers that support non-HTML elements.

Conversion Between SGML Document Types

As you have seen, it’s possible to convert first to HTML and then to another DTD application in SGML. When you can’t find an intermediate file type into which you can translate your target document, you might be able to translate it into an SGML document type instance—such as HTML—and then transform it into a document instance of another SGML application file type.

Although HTML is a popular DTD, others exist that you can try. Before HTML became a favorite, The Text Encoding Initiative (TEI) had—and still does have—some very successful DTDs. Some nice publishing DTDs exist from the AAP, and there is a conversion tool called AAP2ISO that will convert instances of their document type to other document type instances.

To convert between SGML document types, you can use:

  Tools that convert HTML files to other SGML application instances
  Tools that transform one document type instance into an instance for another DTD

The first category exists only because HTML is so popular. The second category is one to pay attention to. Those tools require that you have DTDs handy for the target and source document types; the source document instance must be handy, too. The source file needs to be valid SGML; it has to parse.

Several popular conversion tools exist in the public domain. These include:

  CoST (Copenhagen SGML Tool)
  qwertz
  Rainbow
  SGML2TeX (which converts only to TeX)
  SGMLS.pm

The SGML archive at ftp://ftp.ifi.uio.no/pub/SGML contains the latest versions of all these tools. If you don’t find them there, do a Lycos or WebCrawler search for them.


Previous Table of Contents Next