Previous Table of Contents Next


The World of Perl

Perl (Practical Extraction and Report Language) is a computer language for text processing. Developed and maintained by Larry Wall, its origins are said to date back to when Larry needed utilities to aid in the administration of several UNIX computer systems.

As a computer language, Perl can be said to share characteristics from C, c-shell, awk, and sed computer languages/programs. Perl really excels in its text recognition and pattern-matching capabilities. To put it in perspective, you can often perform tasks requiring these capabilities using, say, 100 lines of Perl code, that would require 1000 lines in another language.

Product: Perl
Type: Text Recognition and Manipulation Language
Platform: Many
Provider: Larry Wall
Contact Location: University of Florida Perl Archive
http://www1.cis.ufl.edu/perl/

From the SGML perspective, Perl is extremely useful in several respects. By itself, it can be used to write programs for data conversion and other tasks. Its usefulness in this regard cannot be overstated.

For example, in a rather complicated SGML production system that myself and others have developed, Perl programs are used in a variety of ways. These tasks include those listed in table 28.1.

Table 28.1 Perl Program Sample Tasks
Task Description
Document Conversion Converts Interleaf source documents into SGML
File Rename Renames large numbers of scanned art files into SGML system format
SGML Post Processing Adds “floating” content tags into SGML document files
Graphics File Manager Builds document association files that bind graphics files with associated hotspot layers
Document Conversion Error Scanning Scans document conversion output files for error messages and reports to user
Source File Art Identification Scans source document files and identifies names of all referenced art files
Document Comparison Contents of a parts list document are compared with the corresponding bill of materials
Library Management Intermediate art library collection processed periodically; when disk is full, those files not accessed for 30 days are deleted

As you can see, Perl programs are used to perform a number of tasks in this SGML system. Perl’s flexibility in textual pattern recognition make it a valuable tool in SGML system conversions and data manipulation.

There are a number of sources for Perl on the Internet. Among these is the University of Florida Perl Archive (see fig. 28.1).


Fig. 28.1  The University of Florida Perl Archive contains a wide variety of Perl soft-ware versions and other related information.

Perl’s flexibility and utility are becoming increasingly recognized in the software development community. As a result, many users have built various tools and utilities with Perl that perform a number of functions, including those related to SGML processing.

SGML Utilities Using Perl

A number of useful tools for SGML processing based on Perl are available, usually through the Internet. In fact, it seems that as people appreciate the utility of Perl, the available number of tools will continue to grow.

perlSGML. Written as a collection of Perl programs and utility libraries, perlSGML provides various support for SGML document processing. It provides a number of functions for manipulating SGML document instances and DTDs.

Product: perlSGML
Type: SGML Utilities
Platform: UNIX
Provider: Earl Hood
Contact Location: Earl Hood
http://www.oac.uci.edu/indiv/ehood/perlSGML.html

The functions supported by perlSGML include:

  DTD parsing support libraries
  HTML document generator for documentation of SGML DTDs
  Tool for listing changes to a DTD
  DTD content hierarchy tree generator
  SGML document instance parser
  SGML document instance markup removal (removes SGML tags from tagged documents)

Electronic Book Technologies: DynaText

DynaText is a browser for viewing SGML documents electronically. Through a compilation process, DynaText indexes SGML documents into electronic book collections.

Product: DynaText
Type: SGML Book Browser (Reader)
Platform: MS/Windows, Macintosh, UNIX
Provider: Electronic Book Technologies, Inc. (EBT)
Contact Location: Electronic Book Technologies, Inc. (EBT)
Telephone: (401) 421-9550
http://www.ebt.com/

The DynaText browser is particularly noteworthy for its ability to support any DTD (rather than a particular set of standard DTDs). It includes support for hypertext navigation, context-sensitive full text search. Native graphics support includes TIFF and CALS raster formats. CGM vector graphics support is available as an option. The display of complex tables is also supported.

DynaText is particularly powerful in its suitability to particularly large documents (see fig. 28.2). Unlike other SGML viewers, it can handle very large documents without a major downturn in performance. Its ability to perform reformatting on-the-fly on very large documents sets it apart from the other SGML viewers currently available.


Fig. 28.2  DynaText electronic book browser is a powerful tool for viewing SGML docu-ments. It is particularly noteworthy for its ability to handle very large documents.

The output formatting capabilities of DynaText are highly flexible. Similar to the FOSI output specification in its use of SGML syntax, DynaText’s output formatting will shortly support the more powerful DSSSL Lite subset of the Document Style Semantic Specification Language (ISO/IEC standard 10179).


Note:  
A sample version of the DynaText book browser is included along with sample books.

DynaText is part of a family of products from Electronic Book Technologies that supports SGML document systems. Their full range of SGML related products is shown in table 28.2.

Table 28.2 EBT’s SGML Software Products
Product Description
DynaText SGML document viewer
DynaTag Document conversion tool
DynaBase SGML data repository and document management system
DynaWeb SGML based World Wide Web server
CADLeaf Batch Batch graphics extraction and conversion


Previous Table of Contents Next