Previous Table of Contents Next


Can I Make Do with HTML?

Given all these tradeoffs, here are the main things to think about when making the HTML versus full SGML decision for Web delivery:

The form the data is already in. If your data is already in SGML (or in something conceptually similar, like LaTeX), it’s much easier to stick with full SGML and have tags that fit your data naturally. This way you don’t have to design a complicated set of correspondences, and whatever data conversion you do will be simpler.

The document size and number of authors. If your documents are small, don’t have a lot of internal structure, and don’t need to be shared among multiple authors or editors, HTML may be all you need. But a little Web-browsing easily shows the bad things that can happen when people try to break big documents into little pieces: the forest can be lost by dividing it into separate trees.

The structures needed for searching. If you need to do searches that target specific data in your documents, you’ll probably need SGML to label that data. Doing without it is like doing a personnel database without having names for the fields; if you searched for people with salaries less than $30,000, you’d get not only that, but all the people who are less than 30,000 years old!

The frequency of changes. If your data is going to change frequently, you’re better off in SGML, where you can modularize your documents using marked sections (see Chapter 16), entities (Chapters 3, 10, and 16), and other features.

All these things relate to each other, so you often can’t answer one question without thinking about the others. One example is that frequent changes to a document matter a lot less if the document is really small and you have complete unshared control over it. But if a document is big and several authors have to cooperate to maintain it, frequency of changes matters a lot.

How to Use HTML Safely

If you choose to put your data in HTML rather than another SGML DTD, there are several things you can do to make a later transition easier. These things are also helpful in the short term because they make your HTML more consistent, portable, and reliable.

  Make sure your HTML is really valid. Run it through an SGML parser—such as sgmls, yasp, or sp—or use one of the HTML “lint” programs (they’re called that because they go looking around for unwanted dirt that accumulates in dark pockets of HTML documents). Weblint is one such program (you can find it at http://www.unipress.com/weblint).
  Be very careful about quoting attributes. Any attribute value that contains any characters other than letters, digits, periods, and hyphens needs to be quoted (either single or double quotes are fine, but not distinct open/close curly quotes).


Tip:  
There are a couple very common HTML errors that you can get away with in some browsers, but that will break others, and will prevent you from using generic SGML tools. The biggest one is failing to quote attributes, as just described. Probably the next biggest is getting comments wrong. These are right:
    <!-- some text of a comment -->
    <!-- another comment, with two text parts –
    -- of which this is the second -->

But these are wrong (that is, they’re not comments):

    <!-- this comment never ends --!>
    <! This is an SGML syntax error !>
    <-- This is just data to SGML -->
    <!-- This one -- really -- is not a comment -->

  Avoid any part of HTML that is labeled “deprecated” in the HTML DTD or its documentation. Deprecated is a polite term standards use to say, “Don’t use this, it’s dangerous, not recommended for the future, and not even universally supported at present.”
  Be sure to use the HTML “DIV” containers, not just free-standing headings—especially in larger documents. This makes the structure of your document easier for programs to find and process, and it can also help you find tagging errors.
  Avoid colliding with SGML constructs, even if some HTML parsers ignore them. For example, don’t depend on an HTML parser failing to know that the string <![ starts a marked section, that <? starts a processing instruction, or that <!-- starts a comment; always escape such strings, for example, by changing the < to &lt;.

Challenges of Upgrading

If you decide to put your data in an SGML DTD other than HTML, there are a few “gotchas” to watch out for. None are fatal, but you’ll want to start out knowing the rules of the game. The issues are briefly summarized here. The next chapter, “Practicalities of Working with SGML on the Web,” discusses a lot more specifics.


Previous Table of Contents Next