Previous Table of Contents Next


Regular Expression Syntax


Note:  
For those readers who have experience in the computer programming world, the use of regular expression syntax to define content models may seem like old hat (or Programming 101). If you fit into this category, you might want to skip ahead to the section “Specific Declarations.”

Regular expressions provide a mechanism to sketch out the ingredients of SGML content models. They provide a shorthand way to indicate order, occurrence, exclusion, and logical grouping.

The specific notation for the use of regular expressions within SGML is illustrated in table 10.2.

Table 10.2 SGML Regular Expression Notation

Item Type Usage

Sequencing in order or followed by
| Sequencing logical OR
& Sequencing logical AND (in any order)
* Occurrence occurs 0 or more times (Optional)
? Occurrence occurs 0 or 1 time (Optional)
+ Occurrence occurs 1 or more times (Required)

So how do you use regular expressions? It’s quite simple, really. When used in combination with named objects and parentheses, you can define quite complex groupings of objects in specific orders or combinations.

For example, suppose you want to define a product announcement that contains the following: a title, the date, an announcement number, and additional information. This additional information will include a number of paragraphs and perhaps an illustration.

Using regular expressions, you can define the announcement in the following way:

    announcement = (title,date,number,((paragraph+)&(illustration?))

The preceding definition of announcement specifies that it consists of the following objects in the following order: one title, one date, one number, followed by some additional data. The additional data will include one or more paragraphs and may include one illustration. This additional data may begin with either an illustration or the paragraph(s).


Note:  
Because the ampersand (&) is used to join “one or more paragraphs” and “0 or 1 illustrations,” these two collections can occur in either order. However, this definition does not permit an illustration to be in the middle of multiple paragraphs.

Suppose that you want to define your product announcement a little differently. You want to avoid the use of the ampersand (&) and allow for any combination of the following items in the additional data area: paragraphs, illustrations, and numbered lists. You can then redefine your announcement in the following way:

Through this approach, you have specified that the standard data that always occurs in order (title,date,number) will be followed by additional data (adddata). Defined separately, you specify that adddata can consist of any combination of paragraphs, illustrations, and numbered lists occurring in any order.


Caution:  
Even though it is allowed, it’s good practice to avoid the use of the ampersand (&) connector whenever possible in SGML. As you have seen, it’s rarely needed. Its usage tends to strain validating parsers (and occasionally generate parsing logic errors) because of the potential complexity possible to document models.

Additional examples of data structures defined via regular expressions are as follows:

    book = (titlepg,tblcontent,chapter+,bibliogr?,appendix*,index?)
    chapter = (chapt-title,((sect-title?),section-body)+)
    catalog = ((section)|(section+,index))

The previous examples of syntax can be described (defined) as follows:

  A book consists of (in order) a title page, followed by a table of contents, followed by one or more chapters; following the chapters may be a single bibliography (optional), zero or more appendixes, and a single index (optional)
  Chapters consist of (in order) a chapter title and one (or more) sections; sections may include a title, and always include a section body
  Catalogs may consist of one section only, or, one or more sections and an index

As you have seen, the use of regular expression syntax gives you the opportunity to define data structures in a compact way. As you start to examine the details of specific declaration statements, you’ll see how they are used in SGML.

Specific Declarations

The syntax for the DOCTYPE, COMMENT, ELEMENT, ATTRIBUTE list, and ENTITY declarations follow the same general syntax. Specific variations occur in the specific characteristics defined by each declaration type.


Previous Table of Contents Next