Table of Contents


Introduction

XML, to a certain extent, is HTML (Hypertext Markup Language) done right. XML (eXtensible Markup Language) offers a unique combination of flexibility, simplicity, and readability by both humans and machines. HTML developers who have spent years cursing the strange formatting quirks of HTML and the extreme difficulty of converting anything from HTML are in for a treat. XML gives developers the ability to create and manipulate their own tags and works smoothly with Cascading Style Sheets to allow developers to create pages that are as elegantly presented as they are structured. Programmers can build simple parsers to read XML data (or, better, reuse parsers built by others), making it an excellent format for interchanging data.

If you’re an HTML developer who’s interested in XML you’re in the right place. This book attempts to explain XML in terms that any reasonably experienced HTML developer can understand. Although some of the concepts may be difficult, XML itself is really quite approachable. Unlike the Standard Generalized Markup Language (SGML), its behemoth predecessor, XML uses a reasonably concise syntax that can provide developers with an enormous amount of power—without the learning curve associated with SGML. Although XML is, in a sense, SGML-lite, I’ve done my best to avoid describing XML from an SGML perspective.

Because much of the best literature (and experience) available on creating document type definitions and using marked-up documents has come from the SGML community, I’ve pointed out some of the many differences between SGML and XML. If you don’t know or care about SGML, you can safely ignore all such information. Still, it won’t hurt to learn a bit about SGML, if you have the time and interest.

I have great hopes for XML. XML seems to me the best tool for accomplishing great things with markup, a significant improvement on both HTML and SGML. It has more flexibility than HTML, without the mind-numbing complexity of SGML. XML holds out the promise of markup that both humans and machines can interpret, making it easy for developers to debug their documents and for programmers to build systems around them. Although the “paperless office” has been just over the horizon for the last 20 years, XML, in combination with ubiquitous networking, may finally provide the tools needed to make it a reality. (Don’t hold your breath, though; old habits die very slowly.) The simplicity of XML makes it useful for small projects, whereas its clear structures make it useful for larger projects. XML can be massaged, manipulated, processed, fragmented, and rebuilt far more easily than previous formats.

Unfortunately, at press time, there aren’t many tools available that work with XML. This book has been written around some of the few tools available—Tim Bray’s Lark, Microsoft’s MSXML, Norbert Mikula’s NXP, and Peter Murray-Rust’s Jumbo, all of which deserve praise as bold pioneers. James Clark’s NSGMLSU (part of his SP package) deserves honorable mention as a powerful parser, albeit one from the more staid world of SGML. The two leading browsers, Microsoft’s Internet Explorer and Netscape Communicator, offer feeble support for XML and no support, respectively. Nonetheless, both companies have made public commitments to providing support and hopefully will make good on those commitments in reasonably short order.

This book definitely focuses on hand-coding XML. Although I certainly hope that hand-coding will be quickly replaced by rapidly evolving tools, hand-coded XML will be around for a short while at least. It took a while for the HTML toolset to grow, and undoubtedly XML will have its growing pains as well. Even though many SGML tools are available and can be applied to XML development, their price ranges and target market seem to stay well above the broader audience for XML. With time, prices will fall, and tools will become more powerful, just as they have in every other area of computing.

This book is a primer and not a complete guide to all things XML. The document type definitions need applications built around them for them to be useful, and most of the tools presented can give only a basic idea of XML’s potential. I fully expect that “graduates” of this book will be eager to move on to the next great thing. With any luck, those graduates (and people who have read other books as well) will spread the word about XML, building an XML community as rich and varied as the HTML community is now.


Table of Contents