Previous Table of Contents Next


CHAPTER 12
XML and the Future: Site Architectures

Even though it’s clear that XML’s adoption will require some significant changes to the basic infrastructure of the Web, its impact on the structures of sites remains less clear. The W3C’s positioning of XML in the Architecture Domain leaves open many questions about what XML is really for. So far, XML has mostly seen use as a standard used to define other standards—MCF, CDF, RDF, and WIDL, among others, which in turn define other files. XML’s eventual position in the world of the Web is not yet clear. Concluding our exploration of XML’s potential, we’ll survey its implications for the development of Web sites.

Current Web Site Architectures

When the Web first appeared, sites had extremely simple structures, modeled after the hierarchical models of its predecessors, FTP and Gopher. All requests for files referred to actual files, stored in the file system of the server. URLs corresponded to a subset of the file structure on the server, and answering requests was a matter of finding the right file, adding an appropriate header, and sending it back to the browser that requested it. Hyperlinks contained the URL information, allowing developers to create crazy quilts of HTML without having create crazy quilt file structures. Directory structures were the most commonly used organizational tool in the early days, allowing developers to create somewhat structured sites.

As the demand for more up-to-date information has exploded across the Web, many sites have turned to database-driven sites. Tools like CGI, Cold Fusion, Active Server Pages (ASP), and LiveWire put attractive front ends on information stored in relational database systems and even legacy mainframes. Database-generated pages make sites like the FedEx tracking page possible, but are also used for many pages that seem like ordinary HTML. Visitors to the Microsoft site, for example, will encounter many pages created with ASP. Microsoft uses a database system in the background to manage data used on many of its pages, allowing it to make changes quickly.

Database-driven sites power many intranets as well, allowing employees to tap into data sources once locked in cold rooms guarded by protective MIS staff. Groupware and communications software have metamorphosed into Web applications. Lotus Domino’s transition from Notes server to Web server was a notable change, providing instant translation of Notes-formatted documents into Web pages. The complex data structures behind Notes have applications on the public as well as the private Web, and Domino has moved out from behind the corporate firewall to power a few Internet servers.

There are several problems with database-driven sites, however. First, they tend to require more horsepower to overcome the overhead of connecting to a database—or the database server may need more horsepower to handle the increased demand placed on it. Second, database-driven sites are rarely search engine-friendly; most of them in fact put up “Do not enter” signs with the robots.txt file discussed in Chapter 7. Although the information contained in the database is probably well-structured and easily searchable, there’s no easy way for a search engine to connect to a database and collect structured data. (It would probably indicate an enormous security hole as well.) Finally, complex database-driven sites usually require a fairly dedicated team of developers to build applications that can manage the database in addition to the usual team of HTML developers, adding considerable expense to a Web project.

Even though database-driven sites have taken advantage of the processing power of the server, the client side has received a boost with the release of version 4.0 of both the Netscape and Microsoft browsers. Both browsers offer significant improvements in interface control and contain enough tools for some data processing (usually in Java, but also in JavaScript or VBScript) to take place on the client. Microsoft’s Internet Explorer 4.0 even includes controls that permit the browser to connect directly to back-end databases, although clearing security for this requires getting past many roadblocks.

The slow spread of Cascading Style Sheets and dynamic HTML has also had an impact on site architecture. It’s becoming more common for certain aspects of Web design to become centralized. Style sheets in this model can be controlled at one location, allowing the company to provide a basic look for their sites that can then be modified. Dynamic HTML interfaces can be stored as JavaScript code files or as Microsoft’s new scriptlets, which combine scripting with HTML to create reusable interface controls.

HTML documents today are far more than text with markup. Currently, the roster of items that can appear in Web pages includes

  HTML
  Images (GIF, JPEG, PNG, XBM, etc.)
  Sounds (AIFF, AU, WAV, etc.)
  Video (QuickTime, MPEG, AVI, etc.)
  Specialized Plug-in Content (Splash, Shockwave, Acrobat, etc.)
  JavaScript
  VBScript (Internet Explorer only)
  Java applets
  ActiveX controls (Internet Explorer only)

The Web is already a rich programming environment, with constantly improving tools for programming and presentation. Many people, including Web developers, would argue that the Web is complex enough as it is without adding another layer of complication. Adding XML (and all its associated standards) to the Web may be, from this perspective, unnecessary.

Transitional Architectures

XML is definitely creeping up on the Web. Although Microsoft’s Internet Explorer 4.0 now includes two XML parsers, XML documents can be addressed only through scripts and programming, and aren’t presented as part of the regular browser interface. Figure 12.1 shows the latest Microsoft XML demo, a JUMBO-like tree interface to XML documents that is accessible as an applet to Internet Explorer 4.0.) This leaves XML with data-handling duties, not document handling duties.


Figure 12.1  Tree interface for MSXML in Internet Explorer 4.0.


Previous Table of Contents Next