XML: A Primer:HTNL and CSS: WYSIWYG Pages

CHAPTER 2
HTML and CSS: WYSIWYG Pages

Before we move on to XML, let’s examine HTML. Although I pointed out HTML’s inadequacies in the previous chapter, HTML has popularized markup languages, and its existing structures deserve a full examination. By looking at HTML from the perspective of XML, you should understand what’s to come without being too put off by the brave new world of XML. Cascading Style Sheets are also important to both projects. Delivering much of the holy grail of HTML WYSIWYG development, CSS frees markup from the formatting structures that have complicated it for so long.

HTML is an application of SGML. An SGML Document Type Definition (DTD) provides a formal definition of rules for markup. A DTD is available for each of the versions of HTML that have appeared, although some HTML elements (the BR element that indicates line breaks in particular) are hard to express in the more tightly structured world of SGML. HTML’s syntax has always been looser and more forgiving; for example, closing tags have traditionally been optional. Only recently, with the development of HTML creation tools, has closing every tag become common. Browsers could usually tell where one element ended and another began, even though they often would display them differently based on the particular syntax used. Other tags could also be used flexibly, but with more varied results—some browsers collapsed when fed open tags, while others displayed documents in ways that looked completely different from what the document looked like on other browsers. Combined with the loose definitions for how browsers should format particular tags, this flexibility kept many designers up nights as they struggled to recreate formatting in multiple browsers, never finding the half-broken tag that was causing them grief.

HTML Roots: Old, Original Specifications

Before there were tables, frames, font tags, client-side image maps, and all the other magical tools of HTML as we now have it, there was a small set of tags that provided formatting based on the needs of the average academic paper. (The Web, after all, was created by CERN as a place for physicists to exchange their findings.) HTML, unlike SGML, had definite formatting intentions for its tags, but they weren’t as specific as those available in the typical WYSIWYG word processor or desktop publishing package.

Nested under the opening HTML tag are the two major pieces of an HTML document: the HEAD element and the BODY element, each of which carries a different kind of information. The HEAD element contains data about the document—things like the TITLE, the BASE element that can set the base URL for all hyperlinks in the document, and the META elements. The META elements can contain information, also called metadata because it’s data about data, about the author of the document, the organization that created it, keywords for search engines to find, and information that page creation and document management software can use to keep track of a page’s place in the broader organization of a site. Later, we’ll explore another tag used in the HEAD, the LINK tag.

The BODY element is where nearly all the action takes place. Information in the HEAD, apart from the title at the top of the browser window, will normally remain invisible to the user. Information in the BODY section produces the actual look of the Web page and gets most of the attention. Most users of Microsoft Word leave the file properties box (which acts like the HEAD element) turned off; it’s not worth the bother of entering search keywords for every single file. What matters to the average user is the text in the document, with all its formatting. The BODY element in HTML is similar to the main body of a word processing document. Within the BODY element, all text and tags are sequential, following the usual left-to-right down-the-page (or possibly another pattern if you use non-European character encodings). Most of the elements within the body define formatting or create things like images, Java applets, form fields, buttons, and checkboxes. HTML elements provide markup for the appearance and placement of text and other objects on a page—nothing in the body of the document specifies meta-information.

The original HTML tags defined document formatting structures in a general way. The structures were logical rather than appearance-based. H1 indicated a top-level header, not 24-point Helvetica bold underlined. EM meant emphasis, not bold, italic, or underlined. Because the Web was originally designed to run on a wide variety of equipment, from NeXT cubes to VT-100 terminals to PCs and Macs, its originators stayed away from such format-specific tags, leaving it to browser implementers to decide how to format each tag. Some early browsers even allowed users to specify styles for tags as part of their browser preferences.

A simple document created with an early version of HTML might look like this:

  <HTML>
  <HEAD><TITLE>Simple Document, early  
  HTML</TITLE></HEAD>
  <BODY>
  <H1>Introduction to HTML</H1>
  <P>This page has been created purely with logical
  tags. No additional formatting has been specified
  by the designers.</P>
  <P>While it might be nice to specify text like we
  could in Quark XPress, we'll settle for applying
  <EM>emphasis</EM> where appropriate,
  <CITE>citations</CITE> when necessary, and maybe
  highlight a <VAR>variable</VAR> along the way. We
  can also indicate code listings:</P>
  <CODE>
  10 PRINT "HELLO WORLD"<BR>
  20 END<BR>
  </CODE>
  <P>Bulleted lists are easy too:</P>
  <UL>
  <LI>HTML Structures</LI>
  <LI>CSS Structures</LI>
  <LI>XML Structures</LI>
  </UL>
  <P>Numbered and lettered lists are also fun:</P>
  <OL>
  <LI>Item #1</LI>
  <LI>Item #2</LI>
  </OL>
  </BODY></HTML>

Even in the latest browsers, this simple example produces varied results. In Figures 2.1 and 2.2, you can see that Netscape Navigator 3.0 rendered EM, CITE, and VAR in italics, while Internet Explorer 3.0 rendered VAR in a monospace typeface and used a different background color as well.

Figure 2.1 Logical tags in Netscape Navigator 3.0.

Figure 2.2 Logical tags in Microsoft Internet Explorer 3.0.

Table of Contents

CHAPTER 2HTML and CSS: WYSIWYG Pages

HTML Roots: Old, Original Specifications

CHAPTER 2
HTML and CSS: WYSIWYG Pages