Previous Table of Contents Next


XML and HTML

The W3C refers to XML as a part of its Architecture domain, but describes HTML, Style Sheets, and Document Object Models as part of its User Interface domain. The reality is more complicated than that: XML promises to have a significant impact on the user interface as well as on HTML itself. XML is not an official replacement for HTML, but it definitely includes and extends HTML in a way that probably will stop HTML development in the fairly near future. The architecture designation is appropriate to a certain extent, though, because XML is more a tool for creating structures than for applying those structures to a particular interface. As we’ll see throughout the book, these roles can blur, making it difficult to tell which parts of XML are architecture and which are more directly interface-related.

XML is larger than HTML. HTML is an application of SGML, a particular set of tags defined by a DTD written for SGML parsers. XML is a subset (technically, “an application profile or restricted form”) of SGML, containing a subset of SGML’s tools for defining instances. Most of HTML can be defined in XML. Consequently, it’s reasonably easy to integrate XML and existing HTML sites. XML allows you to extend HTML, maintaining compatibility for the most part (remember the note above about closing tags), and allowing you to move well beyond the limited set of tags available in the older markup language.

Most HTML documents can be moved into XML very easily. We will do just that to our first XML document—an HTML document preceded with a processing instruction that declares this document to be XML as well as HTML:

  <?xml version="1.0"?>
  <HTML><HEAD><TITLE>Our first XML Document</TITLE></HEAD>
  <BODY BGCOLOR="#FFFFFF">
  <H1>Welcome to XML</H1>
  <P>Welcome to your first well-formed XML document.
  There isn't too much exciting going on here, but
  there will be soon.</P>
  </BODY></HTML>

The first line is a processing instruction, a creature from SGML not normally seen in HTML. In this case, the instruction merely declares that this is an XML document. Although well-formed documents should have an XML declaration to announce that they are in fact XML, this statement may not always be required. The version identifies this as a document that uses version 1.0 of XML, making it easy for later versions to identify themselves to browsers and parsers. If you don’t specify the version number, 1.0 is the default. Specifying version may not seem important, but it will keep your documents functioning when the rules change, which is likely to happen at some point. The XML declaration will get additional coverage in Chapter 5. The only other modification we made to the HTML was guaranteeing that all the tags are evenly matched (all start tags have end tags). For new documents, this is generally easy to enforce, but legacy HTML will present many problems, especially hand-coded HTML. (Most of the WYSIWYG HTML tools available will apply closing tags by default.) The old HTML will still work in a browser; however, it just won’t get read as XML. If you have questionable HTML, either fix it or don’t mark it as XML.

Creating your own Markup: A Well-Formed Document

You can use the XML declaration we just declared to build your own XML documents. It’s working without a safety net, because there are no structures to protect you from your own mistakes, but it does give you a reasonable place to start designing your elements. Some designs are best constructed by looking at the top levels of the problem and carefully analyzing them, whereas others are best created by working from the bottom up. Choosing sample documents and marking them up in an experimental process may help you choose your elements and attributes more carefully.

The documents in this section are well formed—they use syntactically correct markup to produce XML that a computer can interpret, but they don’t include a DTD that specifies requirements for all of these tags and makes a document valid. Well-formed documents are more of a convenience and an agreeable means for maintaining backward compatibility with most HTML documents than a recommended way of working. Even though creating your own tags can be downright liberating, it’s only part of what XML intends to accomplish. The discipline that a DTD imposes can be irritating, but it makes interpreting and reusing document content much easier. Well-formed XML documents are more organized than HTML; however, you’ll only realize the full potential of XML when you take advantage of its more powerful tools for creating structures that apply to a set of documents rather than just a single document.


Previous Table of Contents Next