Previous | Table of Contents | Next |
Because of all these users, there is a lot of SGML data out there. Why did all these companies choose SGML instead of HTML? Mostly because its a generic solution; it lets them use tags appropriate to the kinds of documents each one cares about. This means describing the document parts themselves rather than how they should appear on todays output device. This generic approach is why SGML data outlasts the programs that process it, and that can mean huge long-term savings. HTML can do this for a limited number of cases, but not in general. There are other reasons for using SGML:
All these advantages apply to paper production, online delivery, and information retrieval. But once you lay out pages for print, most of these advantages disappear; once all the lines and page breaks are set, the page representation takes over and getting back to the structure is very difficult.
Given all the advantages of generic SGML for big projects, yet all the simplicity of HTML for simple ones, how do you decide which way to go? There are five questions you can ask that will help you choose.
If your documents fit the HTML model and consist mostly of the kinds of elements HTML provides, HTML is probably a good choice. This is especially true if the documents are also small (tens of pages, not thousands). But if you have big documents or documents with special structures or elements, SGML will take you a lot farther.
If you need to do information retrieval, SGML is also better. You can search HTML, but you cant easily pin down just where hits are. This is because the HTML tags dont divide data up as finely as you can with full SGML, and HTML doesnt typically tag large units such as sections (the tags have only been added in the latest revision, and theyre still optional).
Finally, if you need to deliver in more forms than just the Web, you should consider SGML. Tools are available to turn SGML not only into Web pages, but into paper pages, most kinds of word processor files, CD-ROM publications, Braille, and many other forms. This can all be done with HTML in theory, but its harder in practice.
SGML eases data interchange in several ways. Because it helps you avoid using tags for things they dont quite fit, your data is easier to move to other systems, especially if the tags can take advantage of finer distinctions. For example, if you tag book titles, emphasized words, and foreign words as <I> in HTML, you have a problem when you move to something that can distinguish book titles and emphasis, such as a program to extract and index bibliographies. If you make the finer distinctions, you have a choice later whether to treat the items the same or differently.
Computers are pretty bad at sorting things into meaningful categories when they look the same. You almost need artificial intelligence to decide which italic text is a book title and which is something else. The good news is that computers are really good at the opposite task; if youve already marked up book titles and emphasized words as different things (say, <TI> and <EMPH>), its no problem at all for a computer to show them both as italic.
Because of this, down the road interchange is much easier if you break things up early and make as many distinctions as practical. On the other hand, each distinction may be a little extra work, so you need to balance long-term flexibility versus how much time and effort you can put in up front. To figure out this balance, be sure to consider just how long you think your data will last (youre safest to at least double your first guess) and how important your data is.
Importance and lifespan dont always go together. Stock quotes are pretty important when theyre current, but after a year, only a few specialists ever look at them. At the other extreme, some literature that started out on stone tablets thousands of years is still important. Where does your data fit?
Previous | Table of Contents | Next |