Previous Table of Contents Next


Part II examines document analysis in much greater detail. In the meantime, let’s look at some of the issues you face as you perform document analysis in a new environment.

First of all, you are lucky. Because you don’t have to design a DTD that matches an existing collection of documents, you have a bit more leeway as to how to proceed. I have participated in document analysis sessions where our group was working with a large quantity of existing documents and document components. The structure and organization of these documents was based on their visual appearance on the printed page. (In the pre-SGML world, appearance was all that mattered; how you achieved that appearance just wasn’t important.) Tables that looked similar on a page were actually done six or seven different ways depending on which author created them.

Running across such situations is all too common in a pre-existing document environment. It’s also quite understandable. As the specialist, you must deal with all the combinations, permutations, and assorted headaches in translating the documents into structured SGML.

With the “new document” approach, you’re creating documents in accordance to an existing DTD. In fact, if you perform structured authoring with an appropriate authoring tool, you must author according to the rules defined by the DTD. If you’re authoring in a looser environment, you still have the document structure defined and in mind when you create your document.

Going back to the AnyCorp example, suppose you want to address the issue of getting repair and service information to your distributors. Because your product line is evolving so quickly, you’re having trouble keeping the distributors informed of all the changes.

You have a great idea on a type of document to cover these changes: a “Product Advisory.” Created in the form of a bulletin, it can cover a variety of subjects, including parts, repairs, maintenance, safety, or general information.

But what should be in this document? The identification information and the body of the advisory. With the basic parts determined (identification information and advisory body), it’s time to work on the details.

For ID information, you can include advisory numbers to tell individual advisories apart and dates to show when they are created and revised. There should also be a subject line to tell what the bulletin is about.

The body of the advisory proves to be a little trickier. Because you have envisioned the advisory to be a multipurpose document, you just aren’t sure what may be included in individual advisories. Maybe just a few paragraphs describing a safety issue. Perhaps a numbered list of replacement parts for a new model. Or how about an illustration of a repair procedure? What about several of the above? After debating the issues awhile, you finally come up with your DTD.

Document Modeling

Figure 5.1 illustrates the DTD that was developed for AnyCorp’s product advisories.


Fig. 5.1  This Document Type Definition (DTD) defines the document structure of the AnyCorp product advisory.

As you look at the DTD, notice that it begins with a header containing descriptive text:

    <!— *********************************************   —>
    <!— AnyCorp, USA                                    —>
    <!—                                                 —>
    <!— Product Advisory DTD version 1,0 20 Jul 95      —>
    <!— *********************************************   —>

This header contains easily readable identification information that shows:

  The name of your company (AnyCorp, USA)
  The name of your DTD (Product Advisory)
  The version of your DTD (1.0)
  The date that you created or revised it (20 Jul 95)

Look at the first element in the DTD, advisory. This is the top level element. It defines the major sub-sections (or subelements) of the document: idinfo, subject, and subsec.

    <!ELEMENT advisory — (idinfo,subject,subsec+)>


• See “Regular Expression Syntax,” p. 174

The plus sign (+) following subsec is shorthand notation for describing an element’s occurrence.

Following the top level element are the element definitions for the primary sub- elements contained within the top-level element (advisory): identification information (idinfo), subject, and subsection (subsec):

    <!ELEMENT idinfo   —   (advnbr,type,dateiss,daterev,product)>
    <!ELEMENT subject  — (#PCDATA)>
    <!ELEMENT subsec   — (title,(%parael;)?)>


• See “Attributes: Their Use and the ATTRIBUTE Declaration,” p. 182

The ATTLIST declaration is a way of declaring element attributes (or values associated with the element).

The elements listed below are those that are included in the idinfo element. In other words, they are subelements of the idinfo element.

    <!ELEMENT advnbr   — (#PCDATA)>
    <!ELEMENT type     — (#PCDATA)>
    <!ELEMENT dateiss  — (#PCDATA)>
    <!ELEMENT daterev  — (#PCDATA)>
    <!ELEMENT product  — (#PCDATA)>

The subsec element is defined as containing a number of objects: the subelement title and the entity parael:

    <!ELEMENT title    — (#PCDATA)>

    <!ENTITY % parael "para|blist|nlist|graphic">

The entity parael, or paragraph element is used as a mechanism to define multiple elements.

The definitions for the element parael could also be done in the following manner:

    <!ELEMENT parael — (para|blist|nlist|illus)>


• See “Entities,” p. 58

For this example, you use an entity to define the paragraph element parael instead. Recall from Chapter 3, “SGML Terminology,” that an entity is a mechanism for referencing symbolic names. In this case, you use an entity rather than an element to define parael:

    <!ENTITY % parael "para|blist|nlist|illus">


• See “Entities: Their Use and the ENTITY Markup Declaration,” p. 184


Previous Table of Contents Next