Special Edition Using SGML:Developing for the World Wide Web

Support for Big Documents

The second area where SGML can help you make more Web impact is with big documents. HTML tends to assume your documents will be small (or that you’ll make them small somehow before putting them on the Web). This means a lot of work as you try to manage lots of parts that are conceptually related, but physically separate. In contrast, because SGML came out of the commercial and in-house publishing industries, it’s used to the idea of big documents with hundreds to tens of thousands of pages. SGML systems on the Web specifically deal with this, and take away the burden of creating and managing many little pieces of information when what you really had in mind was providing access to a few large ones.

We’ve talked earlier about how Web servers can be set up either to convert generic SGML to HTML on demand, or to send the SGML out directly to clients that understand it. For really big documents, you can still do it either way, but time becomes a big factor in both. Because of this, a good SGML server can also break documents down and send overviews, outlines, or successive pieces. That way you can get the part you want, without waiting to download the entire big document, or forcing the author to break it up into many little documents.

Focus on Your Data

The issue of large versus small documents brings you right back to the first focus: your own data. It’s very important to develop a vision for what your Web site is all about, and make your data communicate that vision. Suppose you’re doing a Web site about figure skating. You’ll make a stronger impact if you do a few things:

• Don’t limit yourself to your own information on the subject; actively go out and find more. Go to the library and get some out-of-copyright, but classic books on the subject, some relevant pictures, and so on. For skating, you might find interesting materials on its origins, early Olympic skaters, and so on. If you can get rights to newer materials, that’s even better, but a lot of the best material is under copyright (you’d need to get clearance from the author or publisher to use it).

• Find out what else is on the Web, and make sure you link to it. But don’t just link to it; add some commentary that will help people know what the information is about, and why they might want to follow one or another of your links. Help them save wasted trips, and help them be sure to make the trips they should. For the skating example, it would look bad if someone forgot to mention www.cs.yale.edu/homes/sjl/skate.html.

• Get others involved. If you’re interested in some subject, you probably have friends who are, too. Get them to help out by adding their own perspectives. They don’t even have to be on the Net! Maybe one of those friends is someone well-known. If they’ll contribute, it can be a real draw for your site.

• Think about different ways people may want to get at your data. What kinds of searches should your data show up with? Should parts be organized chronologically, alphabetically, and other ways? Giving people many ways to get in makes for a strong site. A skating page might want chronological lists of upcoming competitions and telecasts, but should also give a way to get at the same information by skaters’ names, locations, and so on.

• Think about subjects that are related, even indirectly. For our example, you might want to hook up to information about Eastern Europe and the Soviet Union, since political changes there heavily impacted world skating competitions.

• Reference material is something that is often forgotten. It doesn’t seem flashy at all, and it can be a pain to deal with sometimes. But reference material is what people refer to again and again: information they need. One skating page has a really useful map of the stadium where this year’s Nationals was held and information on rinks and schedules for many places. Travelers really appreciate it when they find that page.

Link to Important Related Data

Although this was mentioned, it’s worth saying again. There’s so much data out there that someone somewhere has data that relates to yours. Someone may even have the very same data already on the Web. It can look pretty silly if you put out a new site and a lot of it is redundant, unless you manage to develop a brand-new spin on it that really makes a difference. Short of hiring spin doctors, a good way to do this is to study what’s there already; do a bunch of Web searches and jot down some notes on what you find. Go look at the pages and think of ways you can do it better. Make friends with the other people working on related data, and join forces if you can. Of course, be sure to avoid online comments like “This is the first Web site to do X,” unless you’re really, really, really sure it is.

Table of Contents