Previous Table of Contents Next


Chapter 7
Defining the Elements

You now know how document analysis fits into the overall SGML procedure, and you understand how important defining elements is to document analysis. This chapter discusses some points that you should keep in mind as you define elements. It is not about declaration syntax, which is covered in Chapter 10. There is a big difference between defining elements and declaring them.

In this chapter, you learn:

  How big your elements should be
  What element content models are
  How to make an element dictionary
  How to use structure diagrams
  What types of content elements have
  What the ten steps to defining elements are

How Big Should Your Elements Be?

You determine the optimal size of an element. It is mostly a matter of taste and utility. You are the one who has to use them.

It is sort of like nuclear physics. First came molecules. Then there were atoms inside the molecules—surely nothing could be smaller than an atom. Then somebody split one, and people are still finding particles that they have to name. Aristotle would be pretty frustrated if he came back today and saw how much more complicated atoms are than he thought.

Take an element of a document with which are you are familiar—a single listing in a residential telephone book (see fig. 7.1).


Fig. 7.1  This telephone listing looks very simple, but it’s probably not detailed enough.

In general, your elements should be just small enough to be useful—and not larger. Break your documents down into the smallest meaningful elements, or building blocks. The example in figure 7.1 is not broken down enough. There are still meaningful structural elements within smaller elements. That is the key.

In deciding whether you have broken down your elements enough, you should ask these questions:

  Will I need to access the information nested in the element?
  Are there any formatting peculiarities that indicate a structural and logical element that I should identify?
  Would the nested information be useful during a database search?
  Have I gone too far by identifying more structure than I need?

It might be useful to search by subelements within the individual item listing—for example, Mrs and 555-1234. Figure 7.2 shows you how to specify more elements.


Fig. 7.2  This example contains more detail than the previous example, but now there’s a superfluous element—the type of street.

Conceivably, each of the elements suggested in figure 7.2 could be useful, depending on the application. If your database is huge, it could be helpful. If you are making an SGML telephone book for your home use, it is probably too detailed.

You need to sort by first name, street address, and telephone number. You can see that the formatting of the street address is different from the formatting of the name. Whether you need to search by telephone number prefix probably depends on how big your database is. In this case, it is the San Diego area telephone book, so it is enormous.

You might have gone too far in specifying Mrs as a title, but maybe not. You might want to search for physicians, for example. It would be handy to have a title element into which you could put Dr. You should make the title element not required, so that you don’t have to put in Mr, Ms, or Mrs in every listing.

The big issue to consider is whether you will need information about an element in the future. Talk with the people in the trenches—the ones who use your documents—to get an idea.


Note:  
Not every element must be visible. Elements often do not print. For example, if Mrs. Balcombe resided in San Diego, the telephone book would have printed only her street address and not indicated that she lives in Coronado. You can then customize your processing system to hide or print elements as you deem appropriate.

If you are not sure what sort of data you will need in the future, allow for expansion. Maybe Mrs. Balcombe could use an AT&T subscriber element, or a bill overdue element later on.


There is no right answer to how big your elements should be. It’s up to you to figure out whether you have too many elements. What matters is that they are useful to you.

Element Content Models

In the previous example, some elements live inside other elements. When you find subelements living inside an element, you must decide what their relationships are to one other.


• See “Model Groups,” p. 65

Hierarchy and Sequence

When you find elements nested within other elements, you need to establish the relationships among them. The two questions to ask are:

  What is the hierarchy of elements?
  What is the sequence of elements?

Consider Mrs. Balcombe. You might be better off if you added hierarchy to the listing. For example, does the prefix element live inside the last name entry?

Likewise, the sequence of elements might be important. Does it matter whether the first name entry comes before the street address entry? Sequence is important in a telephone book. A listing makes no sense if its elements do not appear in a distinct order.

Occurrence

It’s important to consider how many times an element shows up inside its parent. For example, the middle initial element probably does not apply to every other listing in the telephone book. Therefore, its occurrence is different from the actual telephone number, which must appear in every listing. The four possibilities are:

  Required: Occurs once and only once
  Optional: Occurs once or not at all
  Optional repeatable: Occurs more than once or not at all
  Required repeatable: Occurs more than once


Previous Table of Contents Next