Previous Table of Contents Next


Extending the Browser

XML has so far seen only limited application in browsers. CDF files are valid XML documents, and Microsoft has included two XML parsers (one written in Java, one in C++) with Internet Explorer, but so far the focus of the browser vendors has been strictly on XML for use by metadata applications. The SGML community has embraced XML for documents more throughly, although with some reservations about the simplifications it has made. Whether or not XML is a replacement for HTML, it seems likely that browsers will soon be able to present XML documents using at least the CSS toolkit already available for HTML. The continuing efforts of the W3C to create a DOM for both HTML and XML will eventually lead to the development of highly scriptable pages that combine content and function, giving XML another boost into the browser. In this chapter we’ll examine the place the browsers may hold in the XML world, and the possibilities XML holds for making the browser model entirely obsolete.

This chapter is fairly abstract and not critical to XML implementation. Developers who need to make sites work but aren’t especially interested in how the underlying architecture works are welcome to skip to the next chapter.

Anatomy of a Browser

Throughout this book, I’ve had to use XML parsers to demonstrate the structure of XML documents. Although they may seem primitive, these parsers are in fact at the foundations of browsers. Grossly simplified, browsers consist of four key parts: a communications engine that can send requests and receive information using HTTP and other network protocols, a parser that interprets that information, a presentation engine that displays the elements found by the parser, and an interface that controls user interaction with the information provided. A simple model of how a browser processes HTML documents appears in Figure 11.2.


Figure 11.2  Basic browser structure.

The communications engine gets HTML files from Web servers and passes them to the parser, which breaks them down into a tree of discrete elements. The presentation engine examines the contents of those elements and formats them properly for the screen, downloading additional materials as necessary. The interface provides the browser window in which the document is displayed, with its menus, navigation aids, scroll bars, and other features. It handles user actions and opens new pages when necessary, which go through the same communications—parsing and processing.

Browsers right now are far more complicated than this simple model. They include scripting engines, style sheet interpreters, Java Virtual Machines, plug-in interfaces, and all kinds of graphics engines, along with an ever-growing number of attachments to provide mail, news, groupware, HTML editing, and other integrated features. The channel features described in Chapter 9 have added some extra overhead, as do the two separate parsers for XML that are available in Internet Explorer 4.0. Handling all those parts has exploded the browser out from its origins as a very small, simple program. Browsers now are growing as large as full-scale office applications, eating up more hard drive space and consuming more download time with every new version release.

Competition between the browser vendors has also changed the rules for parser, presentation, and interface: documents are becoming dynamic. We touched on this in Chapter 7, but its impact on the browser deserves more attention. Documents are no longer static entities incapable of changing after they’ve reached the browser. Scripts can add and remove elements, change their appearance, modify their contents, and move them around the screen. The parsing engine still reads in code as it arrives, but the resulting tree it produces is now open to manipulation and modification. Effectively, scripts have been given read and write access to the document tree, making possible a whole new category of browser-based interfaces.

For more information on early developments in dynamic documents, see my book Dynamic HTML: A Primer (MIS:Press, 1997).

The W3C is currently in the process of standardizing the competing approaches to this technique. The Document Object Model Working Group presented their first Level 1 Working Draft on October 9, 1997. Their abstract presents neatly the impact the Document Object Model will have on the simplified model presented in Figure 11.2:


The Document Object Model (DOM) level one provides a mechanism for software developers and Web script authors to access and manipulate parsed HTML and XML content. All markup as well as any document type declarations are made available. Level one also allows creation “from scratch” of entire Web documents in memory; saving those documents persistently is left to the programmer. DOM Level one is intentionally limited in scope to content representation and manipulation; rendering, validation, externalization, etc. are deferred to higher levels of the DOM.

Our all-powerful parser serves to create only an initial state for the browser, after which the element tree it creates may be modified, reorganized, or even rebuilt. Our XML documents, and even their document type declarations, may change shape (which may cause problems, at least until higher levels of the DOM appear to clarify validation).

These developments are the latest stimulus for the continuing expansion of the browser. Netscape’s long-held dream of creating a browser that provides a complete interface is on the verge of being realized, although Microsoft seems to have stolen the lead with Internet Explorer 4.0. The implications of this extreme new flexibility are enormous. The browser environment is reaching the point where it is rich enough to handle a variety of data presentation and processing jobs, most of which used to be the field of applications built with specialized client-server tools. Although it remains to be seen if the DOM will provide enough flexibility for developers to write a word processor in a browser, it certainly promises enough flexibility to make it possible to create far more powerful client interfaces than the forms we have at present.


Previous Table of Contents Next