Previous Table of Contents Next


XML allows developers to create document structures many times more complex than those available in HTML. Document structures are useful because they guarantee that documents have all their required elements, and that document authors haven’t run completely amok, creating their own wild formats and putting information into the “wrong” places. These structures allow document management systems to check on completeness and to assist formatting engines in producing visually appealing representaions of XML documents. Combined with style sheets, document structure elements make it possible to use XML to create readable, highly formatted Web pages, complete with headlines, subheads, paragraphs, citations, indented blocks, and all the structures previously available in HTML.

Document structures are easy to identify by looking at a document. Their primary mission is to provide a roadmap to the information, allowing readers to find the information they need at a glance. Not suprisingly, most SGML implementations have been targeted at documentation projects, which tend to produce enormous amounts of information that need structure to help readers find their way through. Documentation usually follows strict conventions, often resorting to paragraph numbering for easy references (i.e., see paragraph 1.3.2 for detailed information about widgets). Projects that already have strong structures are easy targets for markup languages; the real challenges are less structured documents that contain multiple types of information.

When developing DTDs based on document structures, developers should check to see what has already been done. Large companies and other organizations may already have their own SGML standards. Organizations of every size may need to adhere to standards to allow them to circulate information easily. Document structures are much less likely than data structures to demand their own unique DTDs (a memo is pretty much a memo, whatever organization or individual produced it).

Developers in need of some inspiration may want to examine the work of the Text Encoding Initiative (TEI) (http://www.uic. edu/orgs/tei). This academic organization has produced an enormous set of standards for scholarly document encoding. Written to provide standards for the conversion from printed books to electronic formats that scholars could use more readily, the TEI DTDs provide extensive frameworks for all kinds of materials from prose to poetry to plays to commentaries. TEI standards regularly cross the boundaries between document structure encoding and data encoding. In fact, they demonstrate how blurred the distinctions can be. Nevertheless, their adaptations of markup structures to a variety of document types are generally well thought-out and informed by implementation as well as planning.

In general, good document structure systems are usually more obvious than good data structure systems. Most organizations have already given some thought to these issues, and many have even considered their impact on document presentation, storage, and management. Desktop publishers have had to develop a sense of the structures of their materials, Web developers have had to build homes for a variety of different types of information, and technical writers and other documentation writers have generally had to work with preset document structure expectations. XML offers developers the chance to codify these structures and possibly make them more interoperable. XML’s freedom from formatting information gives it the flexibility to deal with all these situations, making it possible for newsletter information to reappear on a CD-ROM, the Web, or even in a printed company history without too much mangling and rebuilding along the way. Building abstract document structures makes producing and managing documents much easier in the long term.

Data Structure

If the document structures provide a table of contents for documents, the data structures provide the index. Document structures organize your document to help readers understand the structure of an argument or follow an extended discussion without getting lost. Data structures reflect the content directly, with little concern for where they appear in an overall document structure. A DTD may require that they appear in a particular document structure, but they often have more freedom to “float” within a document.

The ability to create structures based on data content gives XML most of its practical advantage over HTML. Even though document structures are useful, they have little direct effect on the ways that the information in documents can be reused by computers. They may allow management systems to identify the locations of information more precisely, but they do very little to help them actually retrieve the information. Document structures help humans read documents, but they do very little to help computers find the critical pieces of data they need without human assistance. A table in the middle of a document may be easily identified as a table, but without additional information, a computer cannot extract data from that table for reuse in analysis. Table headers may be useful to a certain extent, but as soon as multiple tables with similar headers appear in a document,which is a common situation, the computer is stuck once again when it tries to determine what information is relevant.


Previous Table of Contents Next