Previous Page TOC Next Page


A

CGI Reference

This appendix provides a reference for the CGI protocol and related variables, including MIME types, environment variables, and hexadecimal encoding for nonalphanumeric characters.

Output

To output something from a CGI application, print to stdout. You format output as follows:

headers
body/data

Headers

Headers consist of the HTTP header's name followed by a colon, a space, and the value. Each header should end with a carriage return and a line feed (\r\n), including the blank line following the headers:

Header name: header value

A CGI header must contain at least one of the following headers:

Location: URI
Content-Type: MIME type/subtype
Status: code message

You can include additional headers including any HTTP-specific headers (such as Expires or Server) and any custom headers. See Chapter 4 for a discussion of the Location header. Table A.1 lists the status codes, which tell the client whether the transaction was successful or not and what to do next. See Chapter 8, "Client/Server," for more about status codes.

Status code

Definition

200

The request was successful and a proper response has been sent.

201

If a resource or file has been created by the server, it sends a 201 status code and the location of the new resource. Of the methods GET, HEAD, and POST, only POST is capable of creating new resources (for example, file uploading).

202

The request has been accepted although it might not have been processed yet. For example, if the user requested a long database search, you could start the search, respond with a 202 message, and inform the user that the results will be e-mailed later.

204

The request was successful but there is no content to return.

301

The requested document has a new, permanent URL. The new location should be specified in the Location header.

302

The requested document is temporarily located at a different location, specified in the Location header.

304

If the client requests a conditional GET (that is, it only wants to get the file if it has been modified after a certain date) and the file has not been modified, the server responds with a 304 status code and doesn't bother resending the file.

400

The request was bad and incomprehensible. You should never receive this error if your browser was written properly.

401

The client has requested a file that requires user authentication.

403

The server understands the request but refuses to fulfill it, most likely because either the server or the client does not have permission to access that file.

404

The requested file is not found.

500

The server experienced some internal error and cannot fulfill the request. You often will see this error if your CGI program has some error or sends a bad header that the server cannot parse.

501

The command requested has not been implemented by the server.

502

While the server was acting as a proxy server or gateway, it received an invalid response from the other server.

503

The server is too busy to handle any further requests.

MIME

MIME headers look like the following:

type/subtype

where a type is any one of the following:

The subtype provides specific information about the data format in use. A subtype preceded by an x- indicates an experimental subtype that has not yet been registered. Table A.2 contains several MIME type/subtypes. A complete list of registered MIME types is available at URL: ftp://ftp.isi.edu/in-notes/iana/assignme nts/media-types.

Type/Subtype   Function

text/plain

Plain text. By default, if the server doesn't recognize the file extension, it assumes that the file is plain text.

text/html

HTML files.

text/richtext

Rich Text Format. Most word processors understand rich text format, so it can be a good portable format to use if you want people to read it from their word processors.

text/enriched

The text enriched format is a method of formatting similar to HTML, meant for e-mail and news messages. It has a minimal markup set and uses multiple carriage return and line feeds as separators.

text/tab-separated-values

Text tab delimited format is the simplest common format for databases and spreadsheets.

text/sgml

Standard General Markup Language.

image/gif

GIF images, a common, compressed graphics format specifically designed for exchanging images across different platforms. Almost all graphical browsers display GIF images inline (using the <img> tag).

image/jpeg

JPEG is another popular image compression format. Although a fairly common format, JPEG is not supported internally by as many browsers as GIF is.

image/x-xbitmap

X bitmap is a very simple pixel-by-pixel description of images. Because it is simple and because most graphical browsers support it, it can be useful for creating small, dynamic images such as counters. Generally, X bitmap files have the extension .xbm.

image/x-pict

Macintosh PICT format.

image/tiff

TIFF format.

audio/basic

Basic 8-bit, ulaw compressed audio files. Filenames usually end with the extension .au.

audio/x-wav

Microsoft Windows audio format.

video/mpeg

MPEG compressed video.

video/quicktime

QuickTime video.

video/x-msvideo

Microsoft Video. Filenames usually end with the extension .avi.

application/octet-stream

Any general, binary format that the server doesn't recognize usually uses this MIME type. Upon receiving this type, most browsers give you the option of saving the data to a file. You can use this MIME type to force a user's browser to download and save a file rather than display it.

application/postscript

PostScript files.

application/atomicmail


application/andrew-inset


application/rtf

Rich Text Format (see text/richtext above).

application/applefile


application/mac-binhex40


application/news-message-id


application/news-transmission


application/wordperfect5.1

WordPerfect 5.1 word processor files.

application/pdf

Adobe's Portable Document Format for the Acrobat reader.

application/zip

The Zip compression format.

application/macwriteii

Macintosh MacWrite II word processor files.

application/msword

Microsoft Word word processor files.

application/mathematica


application/cybercash


application/sgml

Standard General Markup Language.

multipart/x-www-form-urlencoded

Default encoding for HTML forms.

multipart/mixed

Contains several pieces of many different types.

multipart/x-mixed-replace

Similar to multipart/mixed except that each part replaces the preceding part. Used by Netscape for server-side push CGI applications.

multipart/form-data

Contains form name/value pairs. Encoding scheme used for HTTP File Upload.

As an example, the header you'd use to denote HTML content to follow would be

Content-Type: text/html

No-Parse Header

No-Parse Header (NPH) CGI programs communicate directly with the Web browser. The CGI headers are not parsed by the server (hence the name No-Parse Header), and buffering is usually turned off. Because the CGI program communicates directly with the browser, it must contain a valid HTTP response header. The first header must be

HTTP/1.0 nnn message

where nnn is the three-digit status code and message is the status message. Headers that follow are any standard HTTP headers such as Content-Type.

You generally specify NPH programs by preceding the name of the program with nph-.

Note that HTTP is at version 1.0 currently, but 1.1 is being worked on as this book is being written, and some features and headers from 1.1 have already been implemented in some browsers and servers.

Input

CGI applications obtain input using one or a combination of three methods: environment variables, standard input, and the command line.

ISINDEX

ISINDEX enables you to enter keywords. The keywords are appended to the end of the URL following a question mark (?) and separated by plus signs (+). CGI programs can access ISINDEX values either by checking the environment variable QUERY_STRING or by reading the command-line arguments, one keyword per argument.

Environment Variables

CGI environment variables provide information about the server, the client, the CGI program itself, and sometimes the data sent to the server. Tables A.3 and A.4 list some common environment variables.

Environment variable   Description

GATEWAY_INTERFACE

Describes the version of CGI protocol. Set to CGI/1.1.

SERVER_PROTOCOL

Describes the version of HTTP protocol. Usually set to HTTP/1.0.

REQUEST_METHOD

Either GET or POST, depending on the method used to send data to the CGI program.

PATH_INFO

Data appended to a URL after a slash. Typically used to describe some path relative to the document root.

The complete path of PATH_INFO.

QUERY_STRING

Contains input data if using the GET method. Always contains the data appended to the URL after the question mark (?).

CONTENT_TYPE

Describes how the data is being encoded. Typically application/x-www-form-urlencoded. For HTTP File Upload, it is set to multipart/form-data.

CONTENT_LENGTH

Stores the length of the input if you are using the POST method.

SERVER_SOFTWARE

Name and version of the server software.

SERVER_NAME

Host name of the machine running the server.

SERVER_ADMIN

E-mail address of the Web server administrator.

SERVER_PORT

Port on which the server is running—usually 80.

SCRIPT_NAME

The name of the CGI program.

DOCUMENT_ROOT

The value of the document root on the server.

REMOTE_HOST

Name of the client machine requesting or sending information.

REMOTE_ADDR

IP address of the client machine connected to the server.

REMOTE_USER

The username if the user has authenticated himself or herself.

REMOTE_GROUP

The group name if the user belonging to that group has authenticated himself or herself.

AUTH_TYPE

Defines the authorization scheme being used, if any—usually Basic.

REMOTE_IDENT

Displays the username of the person running the client connected to the server. Works only if the client machine is running IDENTD as specified by RFC931.

Environment variable   Description

HTTP_ACCEPT

Contains a comma-delimited list of MIME types the browser is capable of interpreting.

HTTP_USER_AGENT

The browser name, version, and usually its platform.

HTTP_REFERER

Stores the URL of the page that referred you to the current URL.

HTTP_ACCEPT_LANGUAGE

Languages supported by the Web browser; en is English.

HTTP_COOKIE

Contains cookie values if the browser supports HTTP cookies and currently has stored cookie values. A cookie value is a variable that the server tells the browser to remember to tell back to the server later.

A full list of HTTP 1.0 headers can be found at the following location:

http://www.w3.org/hypertext/WWW/protocols/HTTP/1.0/spec.html

Getting Input from Forms

Input from forms is sent to the CGI application using one of two methods: GET or POST. Both methods by default encode the data using URL encoding. Names and their associated values are separated by equal signs (=); name/value pairs are separated by ampersands (&); and spaces are replaced with plus signs (+), as follows:

name1=value1&name2=value2a+value2b&name3=value3

Every other nonalphanumeric character is URL encoded. This means that the character is replaced by a percent sign (%) followed by its two-digit hexadecimal equivalent. Table A.5 contains a list of nonalphanumeric characters and their hexadecimal values.

Character

Hexadecimal

Tab

09

Space

20

"

22

(

28

)

29

,

2C

.

2E

;

3B

:

3A

<

3C

>

3E

@

40

[

5B

\

5C

]

5D

^

5E

'

60

{

7B

|

7C

}

7D

~

7E

?

3F

&

26

/

2F

=

3D

#

23

%

25

The GET method passes the encoded input string to the environment variable QUERY_STRING. The POST method passes the length of the input string to the environment variable CONTENT_LENGTH, and the input string is passed to the standard input.

Previous Page TOC Next Page