Special Edition Using SGML:Handling Specialized Content and Delivery

Chapter 25
Handling Specialized Content and Delivery

As an international standard, SGML has an enormous job to do. You might say it carries the weight of the world on its electronic shoulders. If you measure weight in terms of bandwidth taken up by graphics and multimedia on the Internet, this is truer than ever nowadays, with HTML being an application of the standard. But as you’ve noticed if you’ve searched the Web or designed applications for local use, some content is just more challenging to manage than other content. Challenging content is the topic of this chapter.

In this chapter, you learn how to handle:

• Tables

• Math and equations

• Footnotes and endnotes

• Citations and bibliographies

Handling Tables

Tables can be one of the more challenging tasks in SGML. This is because they are heavily laden with formatting and can only appear in a certain way, predetermined by the author of the table. And, as any SGML purist will probably tell you, format is not something you should set your hopes on since structured content is what SGML is really all about anyway—except when it comes to tables.

When you think about it, the only difference between tables and text is the appearance of the data they present. The point of having a table is to make the data visually appealing and intuitively understandable. Consider the following paragraph:

The average annual rainfall and tourist revenue, in cubic inches and in dollars, respectively, for the following cities is San Diego: 27 cubic inches and $5.7 million; San Francisco: 110 cubic inches and $10 million; and Indianapolis: 175 cubic inches and $1.9 million.

This information is understandable once you read it through a few times. If there were more cities or more years in the study, however, one would get lost even sooner than with this small example. But even with such a small sample of data, consider how much easier it is to follow when presented as a table.


City	Average Rainfall	Tourist Revenue

San Diego	27 cubic inches	$5.7 million
San Francisco	110 cubic inches	$10 million
Indianapolis	175 cubic inches	$1.9 million

The exact same data appears in the table that’s in the paragraph of text. But it’s so much easier to apprehend in a table that it nearly makes tabular presentation mandatory. Can you imagine how mandatory a table would be if you were presenting federal income tax tables for increments of $5000 to $250,000 plus? That would have to be presented in a table. Naturally, the IRS uses SGML to preserve the integrity of its data.

The Format versus Content Challenge

The problem with format-intensive SGML structures is that you lose the ability to track the meaning of the content or data. It’s not that hard for the SGML processor to read tagged data and to build a table. The processor can reach into its electronic bag and pull out the right sequence of data blocks and lay them down one after the other, column after column, for as many rows as necessary. It can start with the table heading information—such as City, Average Rainfall, and Tourist Revenue—and lay this information like a mason builds a brick wall.

When the mason finishes the first row of bricks, he grabs the next brick and starts a new row on top of the first. The SGML processor does the same thing, cell after cell, row after row, until the </TABLE> tag turns off the table processing. The physical processing of the individual cell elements is relatively easy.

The problem is that the system now has no way of knowing what the meaning in those cells is. The system has no way of knowing whether <CELL>110 cubic inches </CELL> refers to a City, Tourist Revenue, or Average Rainfall. This causes difficulty when you want to perform a database query on SGML tables.

So another approach to handling tables is to make them content-driven. Instead of using <CELL>110 cubic inches</CELL> to mark the Average Rainfall for San Francisco, you could use a structure like:

     <CITY><NAME>San Francisco</NAME><RAINFALL>110 cubic inches</RAINFALL>
     <REVENUES>$10 million</REVENUES></CITY>

This approach lets you track the data. Now the system knows what it means. This approach works extremely well for database programmers who want to query the SGML table.

However, it presents a challenge to the SGML developer because he then has to create a DTD fragment for each and every single table based on table content. This can be extremely labor intensive. So the content-driven approach isn’t very pragmatic for documents whose tables have many types of data.

You need to know whether your application must provide for random or uniform content in your tables. If your tables have aircraft parts one day and shoe prices the next, you need some other type of approach than the content-driven approach. If your tables deal only with one type of data, over and over again, like aircraft parts, the content-driven approach should work extremely well for you. But many applications must account for diverse types of content within their tables. They need a way to handle random content in tables.

Random content tables pose a challenge. The challenge is to present the information in a formatted table while retaining enough content-oriented information in the markup so that intelligent queries can be run without too much difficulty.

Table of Contents

Chapter 25Handling Specialized Content and Delivery

Handling Tables

The Format versus Content Challenge

Chapter 25
Handling Specialized Content and Delivery