Previous Table of Contents Next


CHAPTER 6
Re-creating Web and Paper Documents with XML

Now that we’ve covered all the parts involved, it’s time to create some valid XML documents with reasonably complex DTDs. Despite the many parts involved, creating a DTD doesn’t need to be painful. Developers converting documents from HTML or a word processor style sheet often find that their document structures are actually simplified. This chapter will examine the production and implementation of two example DTDs for document production, including DTD development, document coding, and style sheet creation for browser viewing of the documents.

None of the examples in this chapter include links to other documents. Links in XML are considerably more complex than those in HTML and will receive separate coverage in Chapter 10. For now, familiarize yourself with the syntax of XML DTDs and learn the structures that exist within a document rather than throughout a site.

To XML from HTML

Many businesses already store huge quantities of information in HTML. After spending hours to convert it to HTML from some other format, many of these information keepers probably aren’t thrilled to be hearing about this great new development that promises to sweep away HTML. However, most of the HTML information already created will probably never be formally converted to XML, and if it is, that conversion is likely to be automated. Because HTML files have often been developed to look a certain way, the appearance of finished HTML on the screen has taken precedence over the structure underlying the code. Developers on a deadline must have something to show the client; if broken code doesn’t bother the browsers in which it’s viewed, it isn’t likely to bother the client. Add to that the fact that much HTML is coded by hand (or by tools that litter the page with extra markup), and the odds of HTML pages being close to well-formed XML drop precipitously.

If all that is needed are technically valid XML documents, it might be appropriate to use one of the standard HTML DTDs as the document DTD. The DTDs are currently in SGML that isn’t compatible with XML, but work is in progress on several projects to create XML versions of the HTML DTDs. Remember, though, that valid XML is not necessarily easily managed. Meaningful markup that reflects content rather than formatting is the main advantage of XML, and using the HTML DTDs will not provide that.

For many pages, the conversion process will probably take place by hand, as it would with paper documents. Tools may appear that can “learn” the format of a set of pages and extract the needed content, but sets of HTML pages that aren’t generated by machines rarely follow a format consistently. The text on one page may include five paragraphs, whereas another page may have no text at all. For example,

  <HTML>
  <HEAD><TITLE>Joe's Catalog - Money Counters</TITLE></HEAD>
  <BODY BGCOLOR="#FFFFF">
  <H1>Money Counting Equipment</H1>
  <H2>Basic Money Counter</H2>
  <H4>Count your cash without spending all of it!</H4>
  Joe's is pleased to announce this NEW addition to our line. People with
  piles of change can sort their money easily, and wrap it for the bank.
  Makes the change box a lot more useful!<BR>
  Price:<B>$14.95</B>, <FONT SIZE=1>plus $4.95 shipping and handling.
  </FONT><P>
  Also available: Paper Coin Wrappers, bag of 100: <B>$2.95</B><P>
  <H2>Standard Money Counter</H2>
  <H3>Count and collect your cash automatically!</H3>
  This money counter wraps your change automatically - just feed it the
  plastic change rolls. You'll feel just like the bank when you read its
  LED display announcing how much change you've gathered.<P>
  Special guarantee: 100% accuracy on wrapping or your money back!<BR>
  Price:<B>$64.95</B>, <FONT SIZE=1>plus $7.95 shipping and handling.
  </FONT><P>
  Also available: Plastic Coin Wrappers, box of 400: <B>$10.95</B><P>
  <H2>Super-Duper Money Counter</H2>
  Tired of change?  This machine counts bills as well.  Feed it the take
  from a cash register and watch it count away.  Spits out old bills in a
  separate tray for easy counting. Saves hours of effort spent
  counting pennines - and twenties!<BR>
  Price:<B>$649.95</B>, <FONT SIZE=1>plus $29.95 shipping and
  handling.</FONT><P>
  Uses plastic coin wrappers above, and Paper Bill Wrappers, box of 1000:
  <B>$10.95</B><P>
  </BODY></HTML>


Previous Table of Contents Next