Previous Table of Contents Next


CHAPTER 7
XML for Commerce

Perhaps the most important advantage of XML is that it allows people and companies to exchange information more clearly and completely than previous formats have allowed. Although this can improve the average home page as well as the efficiency of paper-document creation, it has its greatest return on investment in more businesslike fields. XML promises to capitalize on two key trends in the electronic world: the growing use of Web sites as stores and the increasing use of electronic ordering and invoicing. XML can make Web sites more effective by making them more easily searchable, while easing the difficult transition to business-to-business communication by providing intelligible standards for data interchange.

The examples provided in this chapter explore some of the possible ways to use XML to create commerce-enabled DTDs. While the prospect of creating your own DTDs for these functions may seem exciting, you should always check to make sure that an industry standard DTD isn’t already available. Creating your own DTD can give you a well-tailored solution to your systems and your particular needs, but it may also cut you off from the rest of your industry. XML markup is most useful when the same DTD is used by a number of people and tools. Commerce applications have the most at stake in standardization because search engines and other applications must be able to count on the same elements having the same meaning no matter what the source. Compatibility will be more important than a perfect solution in most cases involving multiple organizations.

Who (and What) Will Be Reading My XML?

The documents in the previous chapter were meant, in the end, for humans to read. XML makes the documents easier for machines to manipulate, providing a good way to apply formatting and possibly store the data, but in the longer term, all those documents will be read by people, whether on paper or on a screen.

The applications in this chapter take advantage of several additional advantages of XML. First, it allows developers to create documents that both humans and machines can read. Markup tags may look like information in English (or another language) to the developer, but to the computer they’re simply labels that help it reach the data it needs for processing, stored neatly in nested structures. Second, XML offers considerably more flexibility than the other options currently available for data interchange between systems. XML can be used to represent the contents of a relational database, or it can represent the contents of an old hierarchical database or even the latest object-oriented database. At the same time, XML can easily represent document information, grouping information, or simple lists. Finally, XML provides a structure that is easy for programmers to manipulate using recursive structures that are widely available in most programming tools. Writing an XML parser isn’t that difficult, especially if the parser foregoes validation and checks for well-formedness only.

These strengths give XML a range of capabilities far broader than those of other interchange formats. Although thousands of systems are already available for trading data between computers, none of them offers this much flexibility in a structure that is so easy to program. XML obviously won’t solve all the problems of data interchange because not every program can handle every data structure XML can represent, but it still represents a major step forward. XML is definitely a generalist’s tool. Given enough time and money, there will always be a more efficient or more beautiful way to perform the tasks described later with customized database connections, exquisitely hand-crafted Web sites, or specially coded distributed components.

XML’s combination of flexibility and structure suit it well to a group of applications that seek out information. Search engines and agents can both consider the information available in the DTDs and the tags of these documents when they try to categorize or index them. Making full use of these capabilities will require some standardization of tags—programs will have a hard time making sense of elements like <TODAYSSPECIALPRICE> and <SALETODAYONLY> (<PRICE> probably makes considerably more sense).

Search engines have an especially difficult task gathering and sorting information in its current amorphous Web forms; adding meaningful standardized tags should make search engines better at finding relevant information. Agents usually operate on a smaller scale, typically seeking out choice bits of information for particular users, but they stand to gain in the same way, possibly achieving the status computers scientists have claimed for them for so long.

Automated search tools bring up an additional issue—the dangers of letting programs surf the Web. If your site generates XML documents from databases and could strain (or collapse) under the load created by these automated tools, you should definitely consider creating a robots.txt file for your site. When an agent or search engine visits a site, it should examine the robots.txt file to find out where on the site it is welcome and avoid all proscribed areas. Details are available at http://info.Webcrawler.com/mak/projects/robots/norobots.html. Although robots.txt is not an “official” standard, it is widely accepted by search engine developers and should keep programs from crawling all over your site, slowing it down and possibly (in the worst case) crashing it.

Developing documents for computers to read really isn’t that much more difficult than developing documents for people. Computers are at least predictable, and the strong structures of XML should make it easier to create information that can be used by multiple processing applications, even processing applications of extremely different kinds.


Previous Table of Contents Next