Previous Table of Contents Next


Chapter 28
Other Tools and Environments

The process of building an SGML document system can involve a large number of separate steps and processes. Depending on your specific situation, you might find yourself looking for ways to perform a number of steps involved with creating, managing, and validating SGML documents.

In many cases, you might need to bring non-SGML documents into your SGML system; data conversion tools are particularly useful for doing this. In this chapter, you’ll look at the following:

  Legacy data conversion (into SGML)
  Data validation
  Data transformation
  Other data manipulation

Fortunately, there is a rich set of tools available. These tools range from commercial software products for performing specialized tasks to a complete programming language package that can be used for SGML data conversions, UNIX system administration, and much more.

Though this chapter covers some notably useful software tools, by no means is this summary considered to be a definitive list of SGML tools! The world of SGML software tools is growing rapidly, and many powerful new products are appearing all the time.

Many of these tools are available on a variety of computer platforms. In fact, those tools that are available in shareware or similar licensing agreements often include the actual source code so that they can be adapted for use on those computer systems for which they are not currently available.

The review of each tool includes a reference listing to the item that identifies its type, the computer platform that it runs on, the provider/author name, a source for obtaining it, and contact information (such as Internet address or company telephone number).

Specific Computer Platform Usage

When setting up an SGML document processing system, one consideration is the choice of computer platforms to use. Most organizations will have one or more types of computers in place prior to starting an SGML document system.

In authoring systems, the prospective SGML implementer has a range of choices. Several existing high-end publishing systems that support structured SGML authoring are available on a number of computer platforms, most notably DOS/MS Windows PCs and UNIX computers.

Electronic browsers for SGML documents are also available across a number of platforms, notably DOS/MS Windows PCs, and Macintosh and UNIX computers.

However, in the area of document conversions and parsing, the UNIX-based systems are usually predominant. This is due to the higher processing capabilities of reduced-instruction set (RISC) based UNIX computers. As a result, if you are anticipating a large volume of intensive document conversions, UNIX based systems will probably be your computer platform of choice for converting documents into SGML.

SP/NSGMLS Parser

In selecting a parser for use in validating DTDs and document instances, the SGML user has a variety of choices. Fortunately, one of the best, the sp/nsgmls parser by James Clark, is publicly available.

Product: nsgmls
Type: SGML Parser
Platforms: MS/DOS, Windows, UNIX
Provider: James Clark
Contact Locations: James Clark, Indiana University
http://www.jclark.com/sp.html
http://www.cs.indiana.edu/hyplan/asengupt/sgmlsoft.html#sgmls

The nsgmls parser is included in a package of SGML parsing tools and utilities called SP. Also included in the SP package are tools for performing normalization of SGML tags within a document instance.

This parser, a descendent of the earlier sgmls parser, is available in versions for various computer systems. These include MS/DOS, Windows/NT, and various flavors of UNIX (including LINUX). The source code is available for those wishing to port it to other platforms.


Tip:  
If you are considering installing nsgmls/SP on a different computer, you will probably want to review Nelson Beebe’s notes on his experience with installing the package on various computers at: http://www.math.utah.edu/~beebe/sp-notes.html

While this parser is extremely useful, some people have noted that the documentation that accompanies it (in the form of UNIX-style manual or “man” pages) is rather obscure. As a result, a careful reading of the documentation will prove helpful.


Note:  
The following files are necessary to parse a document:
  SGML document instance, containing the actual document markup
  An SGML declaration, specifying character sets, features, and so on
  The applicable DTD
  An entity mapping file


Previous Table of Contents Next