Previous Table of Contents Next


Internationalization

A final benefit of other SGML DTDs over HTML is that they have more provisions for international and multilingual documents. HTML prescribes the “Latin 1” character set. Latin 1 includes the characters for most Western European languages, but not Eastern European, Asian, or many other languages. Future revisions will probably support “Unicode,” a new standard that includes characters for nearly all modern languages. SGML itself lets each document specify a character set, and doesn’t particularly care whether characters are one, two, or more bytes wide.

Many DTDs also provide a way to mark that individual elements are in different languages. This can have a big effect on display and searching. For example, it helps a lot if you’re searching for the English word “die,” to not get the German word “die,” which means roughly “the,” and so is very common.

DTDs that specifically mark language are also very helpful when you want to create multilingual documents or documents that can customize to the reader’s language. You can create documents where every paragraph has a subelement for each language, and then set up your software to show only the type the user wants; this automatically customizes the document for the reader’s own language:

    <P>
       <ENGLISH>...</>
       <FRENCH>...</>
       <ITALIAN>...</>
       <GERMAN>...</>
       <SPANISH>...</>
       ...
    </P>

Better Support for Large Documents

SGML is especially strong for large or structured documents, documents where several authors share writing and editing, and documents that have components HTML doesn’t provide. A single DTD such as HTML may not provide the types of elements your documents need, in which case you end up using some other type because it gets the desired appearance in the authoring software. This leads to problems down the line. HTML also has only limited support for expressing larger units such as sections, and that makes document management a bit harder.

From Here...

This chapter has discussed the pros and cons of upgrading from HTML to generic SGML, and given some examples of who is already using SGML and how and why they do. It has given some questions you should consider in deciding whether to upgrade, and what costs and benefits you’re likely to encounter if you do.

For more information, refer to the following chapters:

  Chapter 20, “Practicalities of Working with SGML on the Web,” more specifically talks about how to manage SGML data on the Web and what tools are available to help.
  Chapter 3, “SGML Terminology,” covers various SGML methods and structures in general terms.
  Chapter 10, “Following General SGML Declaration Syntax,” covers how to declare and use those structures.
  Chapter 16, “Markup Challenges and Specialized Content,” covers some special SGML methods, such as marked sections, that are also useful even though they may be less common than others.


Previous Table of Contents Next