Previous | Table of Contents | Next |
The World Wide Web has brought more attention to SGML than anything else. Most WWW documents (other than bit-mapped graphics) are SGML documents that use the HTML DTD. If youre using HTML, youre using SGML, although theres much more to SGML. On the other hand, most Web browsers dont support any other DTDs besides HTML. This means that all the other SGML data in the world cant be browsed easily on the Web. (But take heart! Youll learn about several solutions in this chapter.)
This chapter begins by telling you how SGML relates to HTML and whats happening with SGML on the Web already. Then you learn about the practical issues: how to decide whether to go with HTML or SGML for your Web data, and how you can take advantage of each ones strengths and avoid their weaknesses:
In this chapter, you will learn how to:
People often say that HTML is a subset of SGML. This is nearly right, but its a bit more complicated. Technically, HTML is an application of SGML. This means that its really a DTD, a set of tags and rules for where the tags can go. SGML is a language for composing DTDs that fit various kinds of documents. There are many applications, and therefore many DTDs (HTML, the DTD for the World Wide Web, is probably the best known one).
You already know that a DTD is always designed for some particular type of document: business letters, aircraft manuals, poetry, and so on. An important question when deciding whether to put some data in HTML or another SGML DTD is What kind of documents is the HTML DTD meant for?
Here is a sample of the kinds of tags that exist in HTML (the new version 2.1 of HTML is being finalized even as I write, and further improvements are still coming, so this list will improve a bit very soon). First, HTML has a lot of tags for marking up common kinds of structures (all of which are not listed here):
HTML also includes several element types that express formatting rather than structure. These pose some portability problems, but they can be useful in cases where you simply must have a certain layout:
From the selection of element types, you can easily see the kinds of documents HTML is best for: fairly simple documents with sections, paragraphs, lists, and the like. In fact, most of the HTML element types are pretty generic; nearly every DTD has paragraphs and lists in it. One place where HTML excels, however, is in linking. Although it only has a couple of element types for links, they can use URLs to point to any data anywhere in the world. For more details on HTML, you may want to read Ques Special Edition Using HTML.
So, why use other SGML DTDs? The main reason is that not all documents consist of only these basic kinds of elements. Whenever you run across some other kind of element, you have to cheat to express it in HTML. A very common example is the level-6 heading element in HTML (H6). Because the first browsers formatted H6 headings in small caps and there was no text emphasis tag that would give the same effect, people got in the habit of using H6 to mean small caps. Of course, some people also use H6 as a heading, and many people use it both ways.
This works fineuntil something changes. Suppose that a browser comes along that lets users adjust the text styles for different tags, for example. Someone changes H6 to look like something besides small caps, and everyone who was counting on small caps is surprised. Sometimes this wont matter, but it might; what if the user wants all the headings big and all the text emphasis small? Or what if the user is blind? When his browser runs across an H6 element, it wouldnt do any good for his browser to put it in large type, so instead maybe its computer-generated voice says section and reads the heading loudly; in the same way, maybe such a browser is not supposed to do anything special for small caps.
The most important problem, though, is that you might want to use the tags for something completely different than formatting later. What if a browser is really friendly and makes automatic outlines by grabbing all the headings? Or what if you want to do a search, but only for text in headings (you might want to do that because if a word occurs in a heading, its probably more important than if it just occurs in the main text)?
Using a tag because it gets the right formatting effect is always a problem, usually a delayed one; it works fine when you do it, but the gotcha comes later. People working with the distant ancestors of SGML made up a name for this: tag abuse syndrome.
Previous | Table of Contents | Next |