Previous Table of Contents Next


Focus on Structure

The most difficult demand that XML makes of developers is that it standardizes their document structures. This doesn’t mean that every single document must look the same; it means that developers must examine the components that go into their pages and create standards. HTML provided some tools for creating structures, like paragraphs, lists, and headings, but never demanded that developers apply structures rather than formatting. HTML tags had to work in concert at times, but a headline tag never required that a paragraph follow it, for example. XML can make such demands and (with a validating parser) enforce them. Taking full advantage of XML requires developers to examine their document and data structures closely and to restate them more explicitly.

Document Structure

This chapter is somewhat structured. It opened with a title, followed by some paragraphs. A heading followed, with some more paragraphs, and another heading appeared, followed by an introductory paragraph and a subhead. The text you are reading now is the paragraph below that subhead, so we are several layers down in the document hierarchy, as shown in Figure 4.1.


Figure 4.1  A map of this chapter.

Not every document has a structure as complex as this chapter, but most documents have some kind of structure. Memos, for example, begin with the names of the recipient(s) and the sender(s), as well as the date and other company information. The text that follows is less structured. Letters frequently provide more information, at least in a business environment, frequently including the sender and recipient’s addresses as well as a statement addressing the letter (Dear John), a closing (Sincerely), a signature, and a clear copy of the name of the sender. Attachments may be noted, as may typists and others involved in the preparation of the document. Documents may be more complex than these basic examples, but they are very rarely any simpler.

Don’t take this discussion to mean that your Web development team should be enforcing standards for company memos and other documents, even though that might eventually be a reasonable goal. Many companies’ attempts to standardize formats of commonly used documents have been met with resistance. For a wide variety of reasons, people do not like to format their memos the same way as everyone else. When developing standards for company documents, try to allow for some flexibility— at least on stylistic matters like font and size, if not on header information. Pushing standardization too hard is likely to keep the standard from ever being applied, especially in these early stages when friendly tools for applying them have yet to appear.

HTML took a relatively simple approach to documents, identifying distinct components and creating tools for reproducing them. However, it did not link the parts in any particular way (with the significant exceptions of lists, forms, and tables). Most elements in the BODY section of an HTML document can appear anywhere, in any order. There are no rules declaring that H2 elements must appear only after H1 elements; H2 elements can appear anywhere in the document, with or without other headlines. The only limitations in HTML are those that create block elements. For example, H2 elements don’t work well inside H1 elements. <H1>This is the top<H2>This is the middle</H2>This is the end</H1> doesn’t produce a single line with two sizes of header. Figure 4.2 shows the results.


Figure 4.2  Block element nesting misbehavior.

Apart from this kind of misbehavior, HTML puts very few constraints on the way its document parts are used. List elements were expected to appear in a list, but the browser would cope if they weren’t; the same was true of form elements. Table elements (rows and columns) don’t make sense outside the context of a table and would be ignored. HTML’s lack of structural constraints makes it much easier for beginners to create pages that resemble their creators’ expectations. Even if HTML had such structures, they wouldn’t have been enforced because HTML has no requirement for document validation.


Previous Table of Contents Next