Previous Table of Contents Next


Part VI
Learning from the Pros

22  Developing for the World Wide Web
23  Rapid Development and Prototyping
24  Understanding and Using Output Specifications
25  Handling Specialized Content and Delivery

Chapter 22
Developing for the World Wide Web

To develop SGML data for the Web, you need to be familiar with the last few chapters. They discuss some of the tools and techniques available, and a lot of the costs, benefits, and design trade-offs you will be facing. This chapter talks about what really distinguishes effective use of SGML from boring use, or how to get the most out of your markup effort.

In this chapter, you learn about:

  How SGML and the Web are converging
  The HyTime standard, which builds on SGML to provide many hypermedia capabilities, and so relates closely to the Web
  Some practical ways to make your SGML data on the Web provide real value for your readers

SGML’s Future on the Web

There are a few key problems people often run into on the Web which, as you’ve seen, even basic use of generic SGML can help solve. Some of these problems include:

  Having to convert data into one particular DTD (HTML)
  Sometimes having to use HTML tags in unconventional ways to get just the right formatting (that is, tags are too closely tied to particular formatting)
  Managing several flavors of HTML; a document that looks good in one browser may look horrible in a browser that supports a slightly different version or that has different proprietary extensions
  Breaking up big documents

Another problem is that links using URLs break easily (for example, if the destination file is merely moved or renamed) and are hard to fix, but that’s not exactly an “HTML” problem (URLs are defined by another standard to which the HTML standard refers). HTML is gradually working to address many other problems, but no matter what happens with those, the problems above will still be with you—they’re pretty much built in.

The only direction HTML can go without “trading up” to generic SGML, is to continually add more particular tags (well, it could discard structure entirely and say everyone has to send PostScript or bitmap pictures of pages—but that wouldn’t really be HTML anymore, and it would be a big step backwards!).

Adding more tags to HTML means HTML has to go through a repeated standardization process, create a new (maybe incompatible) version, create incompatibilities in browsers, and so on. No matter how much you add, you’ll never finish (it’s like trying to finalize the English dictionary; people always spoil it by coming up with brand-new ideas and wanting words for them).

The fundamental direction of SGML for the Web is different. SGML just says, “Why fight? Let everybody create the tags they need.” The DTD lets people say what tags they created. A good DTD will come with documentation and at least a sample stylesheet to explain what the tags mean. A client just reads the DTD, the document, and the stylesheet, and it works.

This is probably inevitable on the Web. In the long run, people won’t accept a limited set of tags any more than they’d accept a fixed set of style names in a word processor or a fixed vocabulary for English.

To be completely fair, let’s look at the other side of the coin too: how might HTML influence SGML? There are some traditional snags people run into with SGML—most of them have to do with having so many optional features available. When you first approach SGML, it can look a little like a new car with a 20-page list of options from which to choose; a lot of them look tempting and are useful for some people, but if you try to take too many, you can have problems. What if you ask for the CD and the DAT player for your car, and they don’t both fit? Or four wheel drive and front wheel drive—do you get six wheels?

Every feature costs something, so think about what you need, want, and can afford, and pick just those features (and hope someone checks whether they go together, too!). With a car, the cost of each feature is hard cash. With computer systems, the cost may not be so obvious, but it’s still there. Here are some of the costs of over-using features:

  Different programs may support different SGML options, so a document that uses them might not work everywhere. For example, only a minority of parsers and SGML systems understand SGML features like CONCUR, DATATAG, EXPLICIT, and so on—that’s one reason we haven’t focused on them here.
  Even with programs that support them, less-used options get less testing and less attention, and might, therefore, be less reliable.
  There’s that much more to learn, and that many more controls to keep straight.
  Different options can bump into each other in surprising ways (if you order air conditioning for your car, you may get a slightly larger engine, too; if you use SHORTREFs in SGML, it might suddenly matter a lot more just where you put line breaks).
  Even without using the fancier SGML options, you can run into snags if you make up a DTD with very specialized semantics. For example, tables and equations are very complicated to format, and so SGML systems often build in special formatting features for the tags most commonly used for them (probably CALS for tables, and ISO 12083 for equations). If you make up your own DTD for tables or equations, you may have to set up all that complex formatting yourself. Usually, it’s better to just go with what’s widely supported.
  You might look a little silly if you load up on many options you don’t really need. For example, if you use DATATAG and send your file to another SGML user, he’ll probably think it’s a bit odd, since DATATAG is so uncommonly used and so rarely supported.

HTML’s huge success, despite ignoring almost every SGML option, says something: there’s a lot you can do with “no-frills” SGML. You’ve seen there are important things you can’t do, too, so maybe the real lesson is to make sure you get the options you need, but pick just what you need; don’t waste much money or effort on other capabilities. With SGML, this usually means not bothering about fancy minimization controls like DATATAG and RANK, and not using regular features in really subtle ways, such as marked sections that cut sideways across entity or element boundaries.


Previous Table of Contents Next