Special Edition Using SGML:Object-Oriented Development of SGML Applications

Table of Contents

When multiple classes have various methods or attributes in common, it would mean redundant work to specify the same things over and over for different classes. This is where the inheritance comes in; it allows you to define attributes and behavior for one class, and then to define another by saying, “this new one will have the same attributes and behavior as that one, but with these changes.” Typically, this means defining a base class with all the attributes and behavior that the other classes have in common, and then basing the other ones on (or, in object-oriented technology, deriving from) that one. Figure 31.1 shows a diagram of the following classes, which would be used at the top of an SGML class hierarchy.

• SGMLObject. It’s often difficult to determine which classes have what in common and the most efficient inheritance structure, but for a system representing documents, it’s not that bad. First, you define an abstract class (a class designed purely to provide a basis for descendant classes without itself ever having instantiations) called SGMLObject, from which you can derive all the other classes. This also makes it easier to add new methods that can be used by all the element classes, because you only need to add these methods to the SGMLObject class.

• SGMLDocElement, SGMLElement. The root node of a document tree can have attributes (for example, the DTD) that its subelements do not, and the document’s component objects can have attributes that the document object does not (for example, a parent attribute), so we derive two more classes from SGMLObject called SGMLDocElement and SGMLElement. Now, you can add new methods and attributes to all component objects without affecting the document object, and vice versa.
SGMLDocElement and SGMLElement are also abstract classes; you’re not going to instantiate any objects to them. They only exist so that you can derive new classes from them. But why? Why not make each element that you read in from a document instance an object of one of these two classes, instead of deriving new classes based on the DTD declarations and making the elements objects of those new classes?
The answer highlights a principle advantage of using an SGML DTD for creating an object-oriented system: by treating SGML elements from different element types as members of different classes, you can assign specialized attributes and behavior to each element type. A DTD may have separate attributes defined for each element type, and you can use these element definitions to create attributes for the SGML element objects. This creates specialized classes for each SGML element, which is part of the point of inheritance—by allowing you to specify an ancestor and new attributes to set a class apart from its ancestor, you can easily re-use the code originally defined for the ancestor.

• PCDATA. We also define one more descendant of this hierarchy: PCDATA, descended from the SGMLElement class. This is a concrete (that is, non-abstract) class; it will have instances. The instances of PCDATA, or parsed character data, make up the majority of the leaf nodes in a document tree. Internal nodes of the tree contain other nodes, but the leaves generally contain the character data that comprise a document. (I say “generally” because it doesn’t have to; they may contain references to other entities as well, such as picture, sound, or video files, or PCDATA’s cousins, CDATA and RCDATA.)

Fig. 31.1 Class structure of an SGML system (Booch-style diagram).

Caution:
Don’t confuse inheritance structure with object structure. Object structure defines an object’s components and their ordering; inheritance structure is a convenient feature to reduce the developer’s workload.

A Sample Smalltalk SGML System

Let’s look at a Smalltalk program that I wrote to test all these ideas. The STSGML (for “Smalltalk SGML”) application demonstrates how an SGML document instance, set up to be treated like other Smalltalk objects, can be plugged in and used in an object-oriented system. It reads a DTD, defines new classes for each element declaration, and then reads a document instance and instantiates its elements to the classes declared using the document’s DTD.

Table of Contents