Previous Table of Contents Next


Hybrid Content-Format Table Structure

The more recent approach to solving the challenge of random content tables is to relate content to table position. In other words, you have to clearly specify the meaning of each cell by its position in the table so that it can be useful in a database query.

Consider the following DTD fragment structure:

     <!ELEMENT table         - -   (colspec+, thead?, tbody)>
     <!ELEMENT colspec       - O    EMPTY                   >
     <!ATTLIST colspec       width  NUMBER         #REQUIRED>
                             type   CDATA          #REQUIRED>
     <!ELEMENT (thead|tbody  - -   (row+)                   >
     <!ELEMENT row           - O   (cell+)                  >
     <!ELEMENT cell          - O   (#PCDATA)                >

Following this structure enables the search engine to check the value of the <COLSPEC type=XXX> attribute and the values of the <THEAD> element. The database query can now locate a piece of information like you locate your position on a map. First you find the longitude and then you find the latitude. The query can proceed in much the same way.

Following the DTD fragment above, your table markup looks like this:

     <TABLE><COLSPEC width=“2” type=“city”><COLSPEC width=“3”
     type=“rainfall”>
     <COLSPEC width=“5” type=“revenue”><THEAD><ROW><CELL>City<CELL>
     Average Rainfall
     <CELL>Tourist Revenue</THEAD><TBODY><ROW><CELL>San Diego<CELL>
     San Francisco<CELL>Indianapolis
     <ROW><CELL>27 cubic inches<CELL>100 cubic inches<CELL>175
     cubic inches
     <ROW><CELL>$5.7 million<CELL>$10 million<CELL>$1.9
     million</TBODY></TABLE>

Suppose the query engine wants to know what the average rainfall was in San Diego. Now the engine has something to go on. It can first locate the <COLSPEC> element in which the attribute TYPE=“rainfall” is true. In this case, that <COLSPEC width=“3” type=“rainfall”> element is second in the sequence of three. That means there will be three columns in this table, and the Average Rainfall column will be the second. So now the processor knows where to look for rainfall numbers, but how does it know which number belongs to San Diego?

It has to go back to the <COLSPEC> element and look for the type=“city” value. It turns out that the first <COLSPEC> element says <COLSPEC width=“2” type=“city”>. Now it interrogates the <TBODY> element until it finds the <CELL> whose value is San Diego. That value turns out to be in the first row. Now the processor goes to the third column of the first row, whose value is located there. The processor finds the city name in the first column and the value for inches of annual rainfall in the second column. Because the processor had these two values, like the longitude and latitude coordinates of a map, it could find the value it was looking for in a database search.


Note:  
It’s probably a good idea to consider external table processing. If you will be dealing with many tables in your documents, you might find it useful to process them outside of your SGML documents so that parsing problems with them will not necessarily mire the production process for the whole document.

You can bring the tables into the document via an entity reference, like &table1, for instance. This entity can be defined in the document prologue as:

     <!ENTITY table1 SYSTEM “c:\SGML\ENTITY\table1.txt”>

Whatever table processing system you decide upon can both read and write tables in SGML, or whatever other format with which you might be working.

Processing tables externally often makes sense because it simplifies document parsing.


Handling Math and Equations

Like tables, math equations are format-intensive. Some people get so frustrated dealing with these that they just make the equation into an image and load the equation into the document as a graphic. While this may work, it should only be used when all other alternatives have failed (for reasons explained in the next section).

There are really only three general ways to handle equations:

  Make the equation an image and load it as a graphic (even though it’s discouraged against, it’s still an option).
  Call on an external processor, such as a typesetting language or format processor (TeX or PostScript, for example).
  Build up the SGML structures in your DTD.

Equations as Graphics

This approach does work. However, it should be a last resort because graphic formats are not fully standardized and they take up more resources than text files with SGML or similar coding.

Web users seem to be resigned to waiting long periods of time for graphic files to load onto their Web pages. Perhaps this is just a phase; graphic-oriented pages may one day fall out of favor because of their bandwidth-hogging ways. Perhaps soon they’ll pave the Information Superhighway with a fiber-optic cable for everyone and bandwidth will not be such a big issue. In any event, GIFs and JPEGs seem to be pretty common image formats at this point, so if you can screen capture your equation and make a graphic file out of it, you can make it accessible to an SGML or HTML processor to load into a document.

If your document deals with a lot of equations, however—as you do in scientific papers—this approach could cause problems. Because graphic files are so big compared to text files, you’d do better to code documents laden with equations using another approach.


Previous Table of Contents Next