Previous Table of Contents Next


When parsing a DTD, entity reference expansion brings in that part of the document model that was previously defined via an entity (either internal or external to the DTD). In order to do this, the parser must be able to locate the source of all entities defined in the DTD. For entities defined externally, the parser must be able to locate (or resolve) the external reference.

When parsing a document, entity reference expansion locates (and “brings in”) such objects as special characters or notation (such as mathematical symbols), standard text sections (boilerplate), or graphical data (such as company logos).

Attribute validation ensures that the attributes found in an element are within the rules defined for that attribute in the attribute list. For example, if the attribute is required, its presence is confirmed (its absence triggers a parsing error). Similarly, if an attribute must be one of several choices, its content is validated against the legal possibilities.

What Parsers Don’t Do

Within the SGML syntax, you can define very complex document structures. As a result, the capabilities in the validating parser necessary to validate this structure can be very complex.

However, parsers can’t perform magic. There are certain tasks that a parser doesn’t perform:

  Doesn’t ensure content
  Doesn’t validate element content
  Doesn’t distinguish between valid markup and correct markup

If the content of a document instance is legal according to your DTD, it will almost parse correctly. However, the content can still be incorrect in other ways. For example, if you have an element defined for troubleshooting and it contains information on spare parts instead, the parser can’t point this out. The parser has no way of knowing what the correct content should be.

In this sense, the parser should be thought of as a tool to verify valid document markup rather than correct markup.

What To Look for in a Parser

Parsers come in many shapes and sizes. When looking for a parser, there are features to look for. As a minimum, ensure that your parser is a validating parser. If it’s not, you can’t be sure what you’re really getting. Other optional features of parsers that can be nice to have are listed in table 13.3.

Table 13.3 Optional Features of Parsers

Feature Description

Add Omitted Tags When tag minimization is used, this feature resolves and inserts tags that were omitted in the document markup. This feature can be present as an option.
Suggest Error Corrections Upon encountering an error condition, the parser may suggest ways to correct the error.
Warn of Potential Problems In some cases, it is possible to have conditions that are legal but frowned upon in SGML syntax. This parser feature warns you of such conditions.

Evaluating Parser Output Messages

At first, the output of a parsing session might look rather strange and mysterious—something like the utterances of the Oracle of Delphi. The messages might use SGML terminology that you’re not familiar with. In this event, you should check your parser documentation for the terms that are unfamiliar.

When you receive an error message, the parser normally gives you a line number to identify the location of the error. In some cases, you might look at the line and not see an error condition. If this is the case, start looking at lines in your source document (DTD or instance) prior to the identified error location.

If you still can’t locate the source of the error, look at the next higher object in the structure. For example, if you encounter an error dealing with the element title but can’t locate the cause, look at the next higher level element that contains that occurrence of title.

SGML parsers are not “lookahead” in their operation. That is, they stop upon encountering an error. Therefore, your early attempts at parsing might seem somewhat time-consuming as the parser stops at each error, you correct it, rerun the parser, and it stops at the next error.

In some cases, you might encounter errors that seem to contradict the rules of the SGML standard (ISO 8879). When this happens, you might want to double-check the rules. Still can’t find the source of your parsing error? Check your SGML declaration closely for subtle errors. If you still encounter errors, you might want to use a different parser.

From Here…

This concludes your look at the evaluation of DTDs and the issues involved in ensuring that your DTD meets your needs. In doing so, you examined the differences between an enforcing (or “strict”) DTD and a flexible DTD. You also examined what tasks a parser performs (and which ones it doesn’t).

For more information, refer to the following:

  Chapter 14, “Following Good SGML Practice,” looks at the techniques and approaches that ensure your approach to SGML is practical and maintainable.
  Part IV, “Markup Strategies,” examines the issues, challenges, and strategies involved in converting your documents into SGML.
  Part V, “SGML and the World Wide Web,” examines how SGML relates to the Internet and the related issues and implications.


Previous Table of Contents Next