- 53 -

HTML Programming Basics

IN THIS CHAPTER

n A Few Other Tags 987 Hypertext Markup Language (HTML) is the language used to write World Wide Web pages. It is quite an easy language, and as several versions have been introduced over the past few years, it has become quite powerful too. We can't hope to teach you HTML in a single chapter in this book, but we can give you an overview of the language and of how to use the basics to produce a simple Web page or two. A lot of good books on HTML are out there, so if you want to become very proficient in writing Web pages, we suggest you pick up one of them.

A lot of automated Web page production tools are available on the market, mostly for Windows and Windows NT machines. These use a WYSIWYG editor to lay out a Web page, then generate HTML code for you. With this type of tool, you don't need to know much (if any) HTML. Not very many HTML generators are available for Linux, however. On top of that, HTML is quite easy to learn, and anyone who is interested in setting up a Web site for the Internet or an intranet should learn at least the basics. Several tools are available for Linux that scan HTML code to make sure that it is syntactically correct, but we won't bother using any in this chapter. If you want to find a syntax checker, check out one of the Linux support sites, such as http://www.xnet.com, which is a good starting place to find Linux software. Also, the Linux home site of http://www.linux.org usually has information about available software.

What Is HTML?

We'll assume you already know what the World Wide Web (WWW) is. If you've seen a Web page before, you have seen the results of HTML. HTML is the language used to describe how the Web page will look when you access the site. The server transfers the HTML instructions to your browser, which converts those HTML lines of code into the text, images, and layouts you see on the page.

A Web browser is usually used to access HTML code, but other tools can carry out the same function. Many kinds of browsers are out there, starting with the grandaddy of them all, NCSA's Mosaic. Netscape's Navigator is the most widely used browser right now, although Microsoft is slowly making inroads with its Internet Explorer. Which browser you use doesn't matter, because all browsers do mostly the same job: display the HTML code they receive from the server. A browser is almost always acting as a client, requesting information from the server.

The HTML language is based on another language called SGML (Standard Generalized Markup Language). SGML is used to describe the structure of a document and allow for better migration from one documenting tool to another. HTML does not describe how a page will look; it's not a page description language like PostScript. Instead, HTML describes the structure of a document. It indicates which text is a heading, which is the body of the document, and where pictures should go. But it does not give explicit instructions on how the page will look; that's up to the browser.

Why use HTML? Primarily because it is a small language and therefore can transfer instructions over a network quickly. HTML does have limitations because of its size, but newer versions of the language are expanding the capabilities a little. The other major advantage to HTML is one most people don't think about: it is device independent. It doesn't matter which machine you run; a Web browser takes the same HTML code and translates it for the platform. The browser is the part that is device dependent. That means you can use HTML to write a Web page and not care which machine is used to read it.

What Does HTML Look Like?

HTML code is pretty straightforward, as you will see. For the most part, it consists of a bunch of "tags" that describe the beginning and ending of a structure element (such as a heading, paragraph, picture, or table). For each element, there should be a beginning and ending tab. A sample HTML page is shown in Figure 53.1. Don't worry about understanding it all now; you will see this code built up in this chapter. For now, you need to see only that there are beginning and ending tags around each element in the structure. (All the screen shots used in this chapter are taken from either a Windows 95 or a Windows 3.11 machine accessing the Linux server on which we are writing the HTML code through an Ethernet network. The browser is NCSA's Mosaic.)

FIGURE 53.1. A simple example of HTML code.

A couple of important things to know about tags as we get started: they are case insensitive (so you don't have to be careful about matching case), and they are almost always paired into beginning and ending tags. The most common errors on Web pages are mismatched or unterminated tags. In many cases, the Web page will appear OK, but there might be severe formatting problems in some cases. A quick scan of your HTML code will help solve these types of problems.


NOTE: Not all HTML tags have a beginning and ending tag. A few are single ended, meaning they usually have just a beginning. Some others are called containers because they hold extra information. These are not always tagged at both ends.

Tags are written in angle brackets. These brackets signal to the browser that an HTML instruction is enclosed. A sample HTML code element looks like

<tag_name> text text text </tag_name>



where <tag_name> and </tag_name> are the starting and ending tags for the text in the middle. The ending tag has the same name as the starting tag, but is preceded by a slash to indicate the tag's conclusion. The type of tag describes how the text will look. For example, if the tags are heading tags, the text will appear larger than normal body text and might be in bold or highlighted in some way.

How do you write HTML code? There are several ways to do it, the easiest being to use any ASCII editor. Be sure not to save HTML documents in a proprietary format like Word documents, because a Web browser can't understand anything but ASCII. Some specialized HTML editors are available that feature pull-down lists of tags and preview screens. These can be handy when you are working with very large Web pages, but for most people a simple editor is more than enough to get started with.

Starting an HTML Document

The start of an HTML document usually begins with an instruction that identifies the document as HTML. This is a tag called <HTML> that is used by the browser to indicate the start of HTML instructions. Here's a sample chunk of code from a Web page:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is the first heading on my page. </H1>



This is a bunch of text that is written on my home page.  I hope you like it.



</BODY>



</HTML>



You can see that the first and last tags, <HTML> and </HTML>, mark the start and end of the HTML code. The slash in the second tag indicates the end of the structure element. These tags should be at the start and end of each HTML document you write. The <HEAD> and </HEAD> tags mark a prologue to the file and are often used for just the title and key words. Only a few tags are allowed inside <HEAD> tags. One of them is the <TITLE> and </TITLE> pair, which gives the title of the document. The <BODY> and </BODY> tags mark the start and end of the document's main body. The <H1> and </H1> tags are for a heading on the page.

This code can be read by any browser. The result is shown in Figure 53.2. As you can see, the title material is not displayed on the page itself; only the material between the body tags is shown. The title is used at the top of the browser to show the page you are logged into. This acts as an identifier.

FIGURE 53.2. The sample HTML code displayed under Mosaic.

The format of the code shown previously is line-by-line, but it is handled this way just for readability. You can write everything on one long line, if you want, because HTML ignores whitespace unless told otherwise. For debugging and rereading purposes, however, it is helpful to keep the code cleanly organized.

A few other comments about the tags we've used. The <TITLE> tag always goes inside the header tags (<HEAD> and </HEAD>) to describe the contents of the page. You should have only a single title for your page. You can't have other tags inside the head tags. It is useful to pick a short, descriptive title for your documents so that others who see it will know what they are accessing.

The <BODY> and </BODY> tags are used to enclose the main contents of your Web page, and you will probably have only one pair of them. All text and contents (links, graphics, tables, and so on) are enclosed between body tags.

There are several levels of heading tags, each of which is like a subheading of the one higher up. The heading we used in the code shown previously is <H1>, which is the highest heading level. You can structure your document with many heading levels, if you want. For example, you could write this bit of code:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



This is a bunch of text.



<H2> This is an H2 </H2>



This is more text.



<H3> This is an H3 </H3>



This is text about the H3 heading.



<H3> This is another H3 </H3>



Here's more text about the H3 heading.



<H2> This is yet another H2 </H2>



Text to do with H2 goes here.



</BODY>



</HTML>



This code is shown in a browser in Figure 53.3. As you can see, the levels of heading are slightly different, with the higher headings (lower numbers) more distinctive and bolder. This difference lets you separate your pages into logical categories, with a heading or subheading for each category. You can use these headings just as we do when writing a book: H1s can contain H2s, H3s go below H2s, and so on. There are no rules about mixing headings (you could use only H3s, for example), but common sense usually dictates how to structure your page.

FIGURE 53.3. Headings with different tags have different appearances.

What about paragraphs? You can handle paragraphs in several ways, and the rules have changed with each version of HTML. The easiest approach, though, is to use the <P> and </P> tags to mark each individual paragraph. For example, this code uses three paragraph tag pairs:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



<P> This is the first paragraph.  It is a really interesting paragraph and



should be read several times because of its content. </P>



<P> Another paragraph.  It's not quite as exciting as the first, but then



it's hard to write really exciting paragraphs this late at night. </P>



<P> The closing paragraph has to be strong to make you feel good.  Oh well,



we can't always meet your expectations, can we? </P>



</BODY>



</HTML>



The appearance of this code in the browser is shown in Figure 53.4. Note how each paragraph is distinct and has some whitespace between it and the next paragraph. What happens if you leave out the <P> and </P> tags? Because browsers ignore whitespace, including carriage returns, the text is run together as shown in Figure 53.5. So you should use <P> and </P> tags to separate paragraphs on your page. Remember that putting lots of blank lines between paragraphs in your HTML code doesn't matter. Browsers will ignore them and run everything together.

FIGURE 53.4. The use of paragraph tags separates text into discrete chunks with whitespace between them.


NOTE: Strictly speaking, you don't need </P> tags to indicate the end of a paragraph because another <P> would indicate the start of a new one. The <P> tag is one example of an open-ended tag, one that doesn't need a closure. It is good programming practice, however, to close the pairs.

FIGURE 53.5. Without paragraph tags, all the text is run together.

What about comments in HTML code? You might want to embed some comments to yourself about who wrote the code, what it does, when you did it, and so on. The way to write a comment into HTML code is like this:

<! - This is a comment ->



The comment has angle brackets around it, an exclamation mark as the first character, and dashes before and after the comment text. Here's an example of some HTML code with comments in it:

<HTML>



<!- Written 12/12/95 by TJP, v 1.23->



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



<!- This section is about the important first para tag ->



<P> This is the first paragraph. </P>



</BODY>



</HTML>



Links

Links to other places and documents are an important part of the World Wide Web. Links are quite easy to write in HTML. They begin with the link tag <A> and end with </A>. This is an example of an anchor tag, so named because it creates an anchor for links in your document.

The <A> tag is different from the tags we've seen so far in that it has some more text inside the angle brackets. Here's a sample link in a document:

<A HREF="page_2.html">Go to Page 2</A>



In this example, the text between the two tags is what is displayed on-screen, so the user would see the text "Go to Page 2" underlined and usually in another color to indicate that it is a link. If the user clicks on the link, the HREF reference in the <A> tag is read and the document page_2.html is read in to the browser. HREF, meaning hypertext reference, gives the name of a file or a URL that the link points to.

You can use links either in the body of text or as a separate item on a menu, for example. The following code shows a link in a paragraph and one on a line by itself:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is the first heading on my page. </H1>



<P>This is a bunch of text that is written on my home page.  I hope you like it. 



If you would like to know more about me, choose <A HREF="about_me.html">Tell



me more about You</A> and I'll tout my virtues for you. </P>



<P><A HREF="biblio.html">See Bibliography</A>



</BODY>



</HTML>



When displayed in a browser, this code looks as shown in Figure 53.6. Each link is underlined in the text to show that it is a link. (Some browsers change the color of the link text, and others do different things as well.)

FIGURE 53.6. A document with two links in it.

When you are specifying a link to a filename, you must be sure to specify the filename properly. You can give either relative or absolute paths. Absolute simply means you give the full pathname, whereas relative means you specify from the current document's location. For example, these are absolute pathnames (the first in DOS format, the second in Linux format) in a link:

<A HREF="c:\html\home\home.htm">



<A HREF="\usr\tparker\html_source\home.html">



Relative path references are from the current location and can use valid directory movement commands. These are valid examples of relative paths in a link:

<A HREF="..\home.htm">



<A HREF="../../html_source/home.html">



A link to another URL is much the same as a link to a document, except that you give the URL after HREF. For example, this is a link to the Yahoo! home page:

<A HREF="http://www.yahoo.com">Go to Yahoo!</A>



You can have as many links in your documents as you want. It helps to make the link description as useful as possible so that users don't end up at pages or sites they didn't want to access. If you are linking to other sites, you should occasionally check to make sure that the link is still valid. A lot of home pages change location or drop off the Web as time goes by, so verify links to avoid annoyed users.

Lists

HTML lets you use a few different formats of lists, such as ordered, numbered, labeled, and bulleted. The lists are surrounded by tags such as <OL> and </OL> (for ordered list) or <MENU> and </MENU> (for menus). Each item in the list has its own tag, <LI> or something similar, to separate it from other items. A few special types of list tags are for handling glossaries and similar purposes, but we'll ignore them in this HTML overview.

Here's an example of a simple list using the <UL> tags for unordered lists:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is a list of some books I have written. </H1>



Here are the books I wrote on last summer's vacation.



<UL>



<LI> Mosquitos Bug me



<LI> Fun with Bears



<LI> What to eat when you have no food



<LI> Why is it raining on my vacation?



<LI> Getting lost in three easy lessons



</LI>



</UL>



</BODY>



</HTML>



An unordered list is like a normal list, except that it has bullets and is not marked by any special numbering scheme. This code is shown in a browser in Figure 53.7, in which you can see the way the bullets line up and the list is presented.

FIGURE 53.7. An unordered list in HTML.

The same code could be written with <OL> and </OL> tags for an ordered list. An ordered list has numbers in front of the items, as shown in Figure 53.8. This is the same code as shown previously, except that we changed the <UL> tags to <OL> tags.

FIGURE 53.8. An ordered list uses numbers rather than bullets.

Changing Character Appearances

Character tags can be used to change the appearance of text on the screen. There are a few character tags in HTML, including styles (such as italics and boldface) and logical (which indicate emphasis, code, or other types of text). Forcing character type changes with style tags is not usually a good idea because different browsers might not present the text the way you want to. You can use them, however, if you know that your server will be used only with a particular type of browser and if you know how the text will look on that browser.

Logical tags are a much better choice because browsers can implement them across platforms. They let the individual browser decide how italics, for example, will look. For that reason, we'll concentrate on logical tags; you should use them when you can. Eight logical tags are in general use:

The following code shows an example of the use of some of these styles, and the resultant Web page is shown in Figure 53.9.

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with



the <STRONG> use of Strong </STRONG> emphasis.



</P>



</BODY>



</HTML>



As you can see, this browser (Mosaic) interprets the <EM> tag to be italics and the <STRONG> tag to be bold. Most browsers perform this conversion, but other tags might look different with other browsers.

If you want to force character tags, you can do so with <B> and </B> for boldface, <I> and </I> for italics, and <TT> and </TT> for typewriter monospaced font (code).

FIGURE 53.9. The use of logical character tags changes the way text appears.

A Few Other Tags

To wrap up, a few other tags are useful in general Web page production. The first is the <PRE> tag, which means the contents between the tags are preformatted and should be left alone. Between the <PRE> and the </PRE>, whitespace is important. Use of the <PRE> tag lets you preformat tables or other content exactly as you want it (subject to wrapping rules in the browser). For example, the following code has a PRE section in it:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with



the <STRONG> use of Strong </STRONG> emphasis. </P>



<PRE>



This is preformatted



     text that should appear



                     exactly like this in the Browser



</PRE>







</BODY>







</HTML>



As you can see in Figure 53.10, the spacing of the PRE material is retained, and even the text font is the same as the source (Courier).

FIGURE 53.10. The PRE tags let you preformat text.

Another tag that is handy is simple. The <HR> tag creates a horizontal rule across the page. For example, the preceding code can be enhanced with a couple of <HR> tags like this:

<HTML>



<HEAD>



<TITLE> This is my Web Page! Welcome! </TITLE></HEAD>



<BODY>



<H1> This is an H1. </H1>



<P> This is a sample entry that should be <EM> emphasized using EM</EM> and with



the <STRONG> use of Strong </STRONG> emphasis. </P>



<HR>



<PRE>



This is preformatted



    text that should appear



            exactly like this in the Browser



</PRE> <HR>



</P>



</BODY>



</HTML>



As you can see in Figure 53.11, two horizontal rules now appear on the page. The exact appearance of the rule might change with browsers, but the overall effect is to put a divider on the page.

FIGURE 53.11. Use <HR> to draw horizontal rules across the page.

Summary

Many more HTML tags are available to you, but they are used for special items such as tables, graphics, and other add-ins. As we mentioned at the start, this chapter is designed to just give you a quick introduction to HTML, not to teach you everything there is to know. As you have seen, though, HTML is a fairly simple language to work with, and you should have a lot of fun designing your own Web pages.