XML: A Primer:Pan in the Present, Save in the Future

Table of Contents

These systems don’t exist yet; even the few places that use SGML extensively probably haven’t connected their data from the central documents to distributed databases via memos. Office automation on this level promises to build many new data-driven workflow applications, as well as finally reduce the amount of repetitive data entry that remains a constant task even in today’s ubiquitous computing environments. According to the September 1997 Byte, 90% of business data currently lives outside of databases, as memos, spreadsheets, letters, proposals, documentation, and assorted other forms of information. Connecting those documents with a document management system (as opposed to a database) is the real promise of XML. Making the document management system meaningful will require considerable effort building infrastructure, a significant part of which is creating DTDs that provide information about the information in the document.

Developing DTDs that reflect data structures is frequently more difficult than developing document structures. Like relational databases, data structures are very clear in highly structured environments, but they can be extremely murky in ordinary documents. Deciding what counts as data and finding ways to mark it meaningfully are both difficult. Different sectors of an organization may apply data very differently. For example, the individual parts listed on an order are critical information to a shipping department. Nevertheless, they are only of marginal interest to accounts receivable and are interesting only in the aggregate to corporate management. Different priorities can lead to different proposals for data structures and data management, much as they have in other applications of information technology.

Despite the potential for chaos, some basic rules for data design remain useful in deciding which pieces of data rate their own elements and how they should be broken down. Data should always be broken down to the smallest parts that will ever be needed, much as is done in creating a normalized relational database. The preceeding example could have provided the first name and last name as one element, NAME, instead of two elements, FIRSTNAME and LASTNAME. This would certainly be easier for the document creators, who must mark up each element separately, but would cause problems for anyone else who needed to sort the class lists by last name. More complex structures can be built by nesting these smaller pieces in container elements (the STUDENT element, in this case). If the information was actually coming from a database, this structure would be easy to automate, both for exporting the data to XML and for importing it from XML. Hand-coders and document authors forced to deal with the complexity of nested tags may disagree, requiring compromise in many cases. To accomodate the varied uses of different users, compromise on the nesting of subelements within larger container elements may also be necessary.

Making these data structures work requires more than just creating a DTD; it requires continuous negotiation between developers and their user communities. Developers who are lucky enough to build standards only for themselves will be a distinct minority in the XML community. Making XML’s promise of content-based documents come to pass will require considerable political as well as technical skill and will often require a team that can handle both sides of the equation.

Elements and Attributes: Which to Use When

A constant problem in developing DTDs concerns elements and attributes. HTML required many attributes to make its formatting precise, and HTML developers are comfortable manipulating attributes. Despite that comfort level, it is probably better to refrain from using attributes in XML except where they contribute to a specific goal. As we discussed in Chapter 3, this decision is often unclear. Developers creating well-formed XML aren’t likely to use many attributes except perhaps the STYLE attribute from HTML, but developers creating DTDs will need to address the issue constantly.

Attributes are an excellent tool for passing along extra information about your element to an automated processor— a parser, a browser, or a conversion tool. They are not a good place to actually store data. Nesting elements when you need to store data (for example, <STUDENT FIRSTNAME="John" LASTNAME= "Nickelson"> </STUDENT>) could work only in a situation where XML was transferring data between two computers; it would produce only blank space on the screen if a user were to open it in a browser. In addition, you would lose the opportunity to nest additional information inside the attributes. Elements can contain other elements, but attributes can contain only one value. On the other hand, the IDNUM attribute used previously is an appropriate use of an attribute. IDNUM is a hexadecimal identifier for the student that has relevance only to a central database someplace. It shouldn’t be part of the memo’s visible content, but it may prove to be useful to a database, allowing it to connect to the original data source and collect more information. Generally, you should use attributes to store information that may not be useful to humans directly but may help computers process the element properly. If you don’t, you’ll be walking into a maintenance nightmare.

Remember that you don’t need to use attributes to hide information from the user. The CSS display property can be set to “NONE” to keep elements from appearing in a browser window. XSL has similar mechanisms.

Planning for Processing

HTML required developers to understand not only the basics of how a browser read the code and processed it but also the opinions on how to entice people to read the results. XML requires a bit more: developers must understand a larger variety of tools. XML is useful for the same kinds of browsers and search engines that have processed HTML for the past few years, but it is also useful for document management systems, workflow systems, a variety of databases, and even script processing within a browser. XML creation is still a pioneering effort; developers have little to work with because programs that support XML have only now started appearing. Still, XML developers have a rich heritage of examples to draw on from the SGML world and the promise of ever-increasing XML support from major vendors.

Table of Contents