Chapter 16

Subroutine Definition

by Mícheál Ó Foghlú


CONTENTS

One important factor in developing Perl programs is understanding the different ways of sectioning Perl code into functional units. This sectioning is important, both as a method of segmenting your own code so that such segments can be reused in various ways and as a method of using code developed by other people rather than having to reinvent the wheel.

There are three basic levels of segmentation. Within a package, Perl provides for the creation and use of subroutines. As in many structured programming languages, segmentation allows frequently used code, or code designed for a particular subtask, to be grouped logically. This concept also enables the use of recursive subroutines, which is a powerful mechanism for solving certain problems.

When you want to use subroutines that were originally developed in one program in another program, you can do so in two ways in Perl. The first way is to create a library, which you can subsequently include in other programs, giving them access to the suite of subroutines.

In Perl 5.0, this mechanism was expanded and generalized with the introduction of the concept of modules. Although they are more complex to create, modules are a more flexible method of developing and distributing suites of subroutines relating to specific tasks.

Subroutines

The basic subunit of code in Perl is a subroutine. A subroutine is similar to a function in C and to a procedure or a function in Pascal. A subroutine can be called with various parameters and returns a value. Effectively, the subroutine groups a sequence of statements so that they can be reused.

The Simplest Form of Subroutine

Subroutines can be declared anywhere in a program. If more than one subroutine with the same name is declared, each new version replaces the older ones, so that only the last one is effective. You can declare subroutines within an eval() expression. These subroutines will not actually be declared until the run-time execution reaches the eval() statement.

Subroutines are declared in the following syntax:


sub <subroutine-name> {

<statements>

}

The simplest form of subroutine is one that does not return any value and does not access any external values. The subroutine is called by prefixing the name with the & character. (Other ways of calling subroutines are explained in more detail later in this chapter, in the section "How to Pass Values to Subroutines.") Following is an example of a program that uses the simplest form of subroutine:


#!/usr/bin/perl -w

# Example of subroutine which does not use external values and does not return a

value

&egsub1; # Call the subroutine once

&egsub1; # Call the subroutine a second time

sub egsub1 {

     print "This subroutine simply prints this line.\n";

}

TIP
Although you can refer from a subroutine to any global variable directly, this method normally is considered to be bad programming practice. Referring to global variables from subroutines makes reusing the subroutine code more difficult. (Will the same global variables always exist and have relevant values?) It is best to make any such references to external values explicit by passing explicit parameters to the subroutine, as described later in this chapter, in "How to Pass Values to Subroutines."
Similarly, it is best to avoid programming subroutines that directly change the values of global variables. This practice could lead to unpredictable side effects if the subroutine is reused in a different program. Use explicit return values or explicit parameters passed by reference, as described in "How to Pass Values to Subroutines" later in this chapter

How to Return Values from Subroutines

Subroutines can also return values, thus acting as functions. The return value is the value of the last statement executed; it can be a scalar or an array value. You can test whether the calling context requires an array or a scalar value by using the wantarray construct, thus returning different values depending on the required context. The following example, as the last line of a subroutine, would return the array (a,b,c) in an array context and the scalar value 0 in a scalar context:


wantarray ? (a, b, c) : 0;

The following example subroutine returns a value but is not passed any values:


#!/usr/bin/perl -w

# Example of subroutine which does not use external values but does return a

value

$scalar-return = &egsub2; # Call the subroutine once, returning a scalar value

print "Scalar return value: $scalar-return.\n";

@array-return = &egsub2; # Call the subroutine a second time, returning an array

value

print "Array return value:", @array-return, ".\n";

sub egsub2 {

     print "This subroutine prints this line and returns a value.\n";

     wantarray ? (a, b, c) : 0;

}

You can return from a subroutine before the last statement by using the return() function. The argument to the return function is the returned value, in this case. The use of return() is illustrated in the following example (which is not a very efficient way to do the test but illustrates the point):


#!/usr/bin/perl -w

# Example of subroutine which does not use external values but does return a

value using "return"

$returnval = &egsub3; # Call the subroutine once

print "The current time is $returnval.\n";

sub egsub3 {

     print "This subroutine prints this line and returns a value.\n";

     local($sec, $min, $hour, @rest) =

          gmtime(time);

     ($min == 0) && ($hour == 12) && (return "noon");

if ($hour > 12) {

          return "after noon";

      }

     else {

          return "before noon";

      }

Notice that any variables used within a subroutine usually are made local() to the enclosing block, so that they do not interfere with any variables in the calling program that have the same name. In Perl 5.0, you can make these variables lexically local rather than dynamically local by using my() instead of local(). (This procedure is discussed in more detail later in this chapter, in "Issues of Scope with my() and local().")

When multiple arrays are returned, the result is flattened into one list so that effectively, only one array is returned. In the following example, all the return values are in @return-a1, and the send array, @return-a2, is empty.


#!/usr/bin/perl -w

# Example of subroutine which does not use external values returning an array

(@return-a1, @return-a2) = &egsub4; # Call the subroutine once

print "Return array a1",@return-a1," Return array a2 ",@return-a2, ".\n";

sub egsub4 {

     print "This subroutine returns a1 and a2.\n";

     local(@a1) = (a, b, c);

     local(@a2) = (d, e, f);

     return(@a1,@a2);

}

In Perl 4.0, you can avert this problem by passing the arrays by reference, using a typeglob (see the following section). In Perl 5.0, you can do the same thing and also manipulate any variable by direct reference (see the following section).

How to Pass Values to Subroutines

The next important aspect of subroutines is the fact that the call can pass values to the subroutine. The call simply lists the variables to be passed, which are passed in the list @_ to the subroutine. These variables are known as the parameters or the arguments. It is customary to assign a name to each value at the start of the subroutine, so that it is clear what is going on. Manipulating these copies of the arguments is equivalent to passing arguments by value (for example, their values may be altered, but this alteration does not alter the value of the variable in the calling program).

The following example illustrates how to pass parameters to a subroutine by value:


#!/usr/bin/perl -w

# Example of subroutine is passed external values by value

$returnval = &egsub5(45,3); # Call the subroutine once

print "The (45+1) * (3+1) is $returnval.\n";

$x = 45;

$y = 3;

$returnval = &egsub5($x,$y);

print "The ($x+1) * ($y+1) is $returnval.\n";

print "Note that \$x still is $x, and \$y still is $y.\n";

sub egsub5 { # Access $x and $y by value

     local($x, $y) = @_;

     return (++$x * ++$y);

}

To pass scalar values by reference rather than by value, you can access the elements in @_ directly, which will change their values in the calling program. In such a case, the argument must be a variable rather than a literal value, because literal values cannot be altered.

The following example illustrates passing parameters by reference to a subroutine:


#!/usr/bin/perl -w

# Example of subroutine is passed external values by reference

$x = 45;

$y = 3;

print "The ($x+1) * ($y+1) ";

$returnval = &egsub6($x,$y);

print "is $returnval.\n";

print "Note that \$x now is $x, and \$y now is $y.\n";

sub egsub6 { # Access $x and $y by reference

     return (++$_[0] * ++$_[1]);

}

You can pass array values by reference in the same way; however, several restrictions apply. First, as is true of returned array values, the @_ list is one single flat array, so passing multiple arrays in this way is tricky. Also, you can use this method to alter individual elements of the subroutine; you cannot alter the size of the array within the subroutine, so you cannot use push() and pop().

Therefore, another method has been provided to facilitate the passing of arrays by reference. This method, known as typeglobbing, works with Perl 4.0 or Perl 5.0. The principle is that the subroutine declares that one or more of its parameters are typeglobbed, which means that all the references to that identifier in the scope of the subroutine are taken to refer to the equivalent identifier in the namespace of the calling program.

The syntax for this declaration prefixes the identifier with an asterisk (*) rather than an at sign (@), as in *array1 typeglobs @array1. In fact, typeglobbing links all forms of the identifier, so the *array1 typeglobs @array1, %array1, and $array1. (Any reference to any of these variables in the local subroutine actually refers to the equivalent variable in the calling program's namespace.) Using this construct within a local() list makes sense, because it effectively creates a local alias for a set of global variables. The following example illustrates the use of typeglobbing:


#!/usr/bin/perl -w

# Example of subroutine using arrays passed by reference (type globbing)

&egsub7(@a1,@a2); # Call the subroutine once

print "Modified array a1",@a1," Modified array a2 ",@a2, ".\n";

sub egsub7 {

     local(*arr1,*arr2) = @_;

     print "This subroutine modifies arr1 and arr2";

     print " and thus a1 and a2 via typeglobbing.\n";

     @arr1 = (a, b, c);

     @arr2 = (d, e, f);

}

In Perl 4.0, this method is the only way to use references to variables rather than variables themselves. Perl 5.0 also has a generalized method for dealing with references. Although this method looks more awkward in its syntax (because of the abundance of underscores), it actually is more precise in its meaning. Typeglobbing automatically aliases the scalar, the array, and the hashed array form of an identifier, even if only the array name is required. With Perl 5.0 references, you can make this distinction explicit; only the array form of the identifier is referenced.

The following example illustrates how to pass arrays by reference in Perl 5.0:


#!/usr/bin/perl -w

# Example of subroutine using arrays passed by reference (Perl 5 references)

&egsub7(\@a1,\@a2); # Call the subroutine once

print "Modified array a1",@a1," Modified array a2 ",@a2, ".\n";

sub egsub7 {

     local($a1ref,$a2ref) = @_;

     print "This subroutine modifies a1 and a2.\n";

     @$a1ref = (a, b, c);

     @$a2ref = (d, e, f);

}

Subroutine Recursion

One the most powerful features of subroutines is their capability to call themselves. Many problems can be solved by repeated application of the same procedure. You must take care to set up a termination condition wherein the recursion stops and the execution can unravel itself. Typical examples of this approach occur in list processing: Process the head item and then process the tail; if the tail is empty, do not recurse. Another neat example is the calculation of a factorial value, as follows:


#!/usr/bin/perl -w

#

# Example factorial using recursion



for ($x=1; $x<100; $x++) {

        print "Factorial $x is ",&factorial($x), "\n";

}



sub factorial {

        local($x) = @_;

        if ($x == 1) {

                return 1;

        }

        else {

                return ($x*($x-1) + &factorial($x-1));

        }

}

Subroutine Prototypes

Perl 5.002 introduces the capability to declare limited forms of subroutine prototypes. This capability allows early detection of errors in the number and type of parameters and generation of suitable warnings. This is primarily to allow the declaration of replacement subroutines for built-in commands. To use the stricter parameter checking, however, you must make the subroutine call by using only the subroutine name (without the & prefix). The prototype declaration syntax is concise and not as strict as the named formal parameters mechanism is in languages such as Pascal.

The main use for these prototypes at present is in writing modules for wider use, allowing the modules to specify their parameter types so as to trap errors and print diagnostic messages. Therefore, this chapter does not discuss this mechanism in detail.

Issues of Scope with my() and local()

Chapter 1 "Perl Overview," alluded to some issues related to scope. These issues are very important with relation to subroutines. In particular, all variables inside subroutines should be made lexical local variables (via my()) or dynamic local variables (via local()). In Perl 4.0, the only choice is local(), because my() was introduced in Perl 5.0.

Variables declared with the my() construct are considered to be lexical local variables. These variables are not entered in the symbol table for the current package; therefore, they are totally hidden from all contexts other than the local block within which they are declared. Even subroutines called from the current block cannot access lexical local variables in that block.Lexical local variables must begin with an alphanumeric character (or an underscore).

Variables declared by means of the local() construct are considered to be dynamic local variables. The value is local to the current block and any calls from that block. You can localize special variables as dynamic local variables, but you cannot make them into lexical local variables. These two differences from lexical local variables show the two cases in Perl 5.0 in which it is still advisable to use local() rather than my():

In general, you should be using my instead of local, because it's faster and safer. Exceptions to this rule include the global punctuation variables, file handles and formats, and direct manipulation of the Perl symbol table itself. Format variables often use local, though, as do other variables whose current value must be visible to called subroutines.

Perl Libraries

The Perl 4.036 standard library has 31 files. These files have been replaced in Perl 5.0 by a set of standard modules (see the following section). This section describes the older system of libraries based on require(). The package mechanism by itself merely provides a way of segmenting the namespace into units. When this mechanism is combined with suites of subroutines stored in a file that can be included by means of require(), a library is created.

Creation of Libraries

A library is effectively a collection of subroutines in a package. Setting up a file as a library file is a fairly straightforward process. Place the subroutines in a separate file, and add a package declaration to the top of the file. The file name of the library file and the package name should be the same. Then add the line


1;

to the end of the file (so that it returns TRUE when included by the require() function). If you want any of the subroutines to be in the global namespace automatically, change the name of the subroutine to explicitly name the main package (for example, main'mysub).

The following example illustrates how to declare a Perl 4.0 library with a single subroutine, filtest:


 # Sample library file (Perl 4)

package filtest;

sub main'filtest {

     local($fil) = @_;

     -f $fil && print "File $fil is a normal file.\n";

     -d _ && print "File $fil is a directory.\n";

}

1;

In Perl 5.0, the new form main::mysub is preferred to main'mysub for specifying the package name explicitly, but in Perl 5.0 you should consider making a module rather than a library.

Invocation of Libraries

To use a library, you simply use require() to refer to the library name. Perl searches for all directories specified in the @INC special variable when it tries to locate this file. To include the sample library file specified in the preceding section, use the following code:


#!/usr/bin/perl -w

#

require "filtest";

&filtest("/usr/bin");

&filtest("/usr/etc/passwd");

Standard Perl 4.0 Library

Following are the files in the standard Perl 4.036 library, which have been superseded by Perl 5.0 modules:

abbrev.plgetcwd.pl
assert.plgetopt.pl
bigfloat.plgetopts.pl
bigint.plimportenv.pl
bigrat.pllook.pl
cacheout.plnewgetopt.pl
chat2.plopen2.pl
complete.plperldb.pl
ctime.plpwd.pl
dumpvar.plshellwords.pl
exceptions.plstat.pl
fastcwd.plsyslog.pl
find.pltermcap.pl
finddepth.pltimelocal.pl
flush.plvalidate.pl

Modules (Perl 5.0)

Perl 5.0 has a new structure for managing code that is designed for reuse. Modules have many more features than the package-based library in Perl 4.0, which simply provides a means of segmenting the namespace into packages.

You can create a Perl 5.0 module in two main conceptual ways. One way is to use the basic concept of a collection of subroutines, with added features that help control which subroutines are compulsory and which are optional. The other way is to use the new object-oriented facilities of Perl 5.0 to make a module become a definition of class, so that instances of that class implicitly call subroutines (methods) in the library. You can mix these two basic approaches to produce hybrid modules.

The object-oriented features expand the idea of a package to incorporate the idea of a class. Special subroutines act as constructors and destructors, creating objects that are members of the class and deleting the objects when the last reference to the object is gone. Other subroutines provide other ways of manipulating objects of that class. So a package can be simply a package, or it can be a class if it provides the associated subroutines to act as methods for the class objects. When you use the special @ISA array, one package class can inherit methods from another package class. The methods are simply subroutines written to deal with objects in the class.

Explaining the conceptual background of object-oriented programming is beyond the scope of this book, but these concepts need to be mentioned so as to put the descriptions of the standard modules in context, because many of them use these features.

Standard Module List

This section lists the standard modules and pragmatic modules (Pragmas) in Perl 5.002. All these modules should be located in the Perl library path (@INC), should have the extension .PM, and should include their own documentation. Pragmas do not contain subroutines or classes, but act as compile-time directives through side effects that occur when they are referenced.

NOTE
Many other modules exist. See Appendix B, "Perl Web Reference," for information on other nonstandard Perl modules

Some Perl modules are developed in C rather than in Perl. These modules are called extension modules. Because of the problems involved in ensuring that these modules work under all operating systems, they are not as well standardized as the standard modules listed in the following table.

Module NameDescription
AnyDBM_FileProvides access to external databases
AutoLoaderSpecial way of loading subroutines on demand
AutoSplitSpecial way to set up modules for the use of AutoLoader
BenchmarkTime code for benchmarking
CarpReports errors across modules
ConfigReports compiler options used when Perl was installed
CwdFunctions to manipulate current directory
DB_FileProvides access to Berkeley DB files
Devel::SelfStubberAllows correct inheritance of autoloaded methods
diagnosticsPragma enables diagnostic warnings
DynaLoaderUsed by modules that link to C libraries
EnglishPragma allows the use of long special variable names
EnvAllows access to environment variables
ExporterStandard way for modules to export subroutines
ExtUtils::LiblistExamines C libraries
ExtUtils::MakeMakerCreates makefiles for extension modules
ExtUtils::ManifestHelps maintain a MANIFEST file
ExtUtils::MiniperlUsed by makefiles generated by ExtUtils::MakeMaker
ExtUtils::MkbootstrapUsed by makefiles generated by ExtUtils::MakeMaker
FcntlProvides access to C fcntl.h
File::BasenameParses file names according to various operating systems' rules
File::CheckTreePerforms multiple file tests
File::FindFinds files according to criteria
File::PathCreates/deletes directories
FileHandleAllows object syntax for file handles
Getopt::LongAllows POSIX-style command-line options
Getopt::StdAllows single-letter command-line options
I18N::CollateAllows POSIX locale rules for sorting 8-bit strings
integerPragma uses integer arithmetic
IPC::Open2Inter Process Communications (process with read/write)
IPC::Open3Inter Process Communications (process with read/write/error)
lessPragma unimplemented
Net::PingTests network node
overloadAllows overloading of operators (for example, special behavior, depending on object type)
POSIXAllows POSIX standard identifiers
SafeCan evaluate Perl code in safe memory compartments
SelfLoaderAllows specification of code to be autoloaded in module (alternative to the AutoLoader procedure)
sigtrapPragma initializes some signal handlers
SocketProvides access to C socket.h
strictPragma forces safe code
subsPragma predeclares specified subroutine names
Text::AbbrevCreates abbreviation table
Test::HarnessRuns the standard Perl tests

How to Create a Simple Module

Creating a module that is made up of a series of subroutines, in place of an old-style Perllibrary, is relatively simple. It is slightly more complicated to create the other style of module-which is made up completely of methods associated with an object class-if only because you need to understand the object-oriented approach better. Even the simpler style of modules use one of the object-oriented features. Every module should use an import() method inherited from the Exporter class or define its own import() method.

The following example converts the example library used earlier in this chapter (in "Creation of Libraries") to a module:


package FilTest;



=head1 NAME



FilTest - test a file printing status



=head1 SYNOPSYS



     use FilTest;

     filtest1("/tmp/file");



     use FilTest qw(filtest2);

     filtest2("/tmp/file");



=head1 DESCRIPTION



This is an example module which provides one subroutine which tests a file. 

filtest1() is exported by default, filtest2() must be explicitly imported.



=cut



# Sample module



require Exporter;

@ISA = qw(Exporter);



@EXPORT = (filtest1);

@EXPORT_OK = qw(filtest2);



sub filtest1 {

     my($fil) = @_;

     -f $fil && print "File $fil is a normal file.\n";

     -d _ && print "File $fil is a directory.\n";

}

sub filtest2 {

     my($fil) = @_;

     -f $fil && print "File $fil is a normal file.\n";

     -d _ && print "File $fil is a directory.\n";

}

1;

The documentation for modules normally is built into the .PM file in POD format (as illustrated by the bare-bones documentation in the preceding example, from the first =head1 to the =cut) but is not necessary for the module to work.

The module specifies that it is prepared to have the filtest2() subroutine imported by those who use the module, but because it is in the @EXPORT_OK list rather than the @EXPORT list, it will not be exported by default, but must be explicitly included. However, filtest1() is exported by default.

Module Use and Invocation

The standard way to include a module is with the use() function. Later, you can disable the effect of pragmatic functions (Pragmas) that act as compiler directives by using the no() syntax. The following example enables the use of the integer pragma and then disables it:


use integer;

....

no integer;

Normal modules import symbols into the current package or provide access to symbols in the module. If you do not specify any arguments after the module name, the default behavior is to import all symbols specified in the module as EXPORT. If you specify the null list as an argument, no symbols are imported. The following paragraphs describe the various ways to import from the sample module.

The following defines filtest1() in the global namespace:


use FilTest;

filtest1("/etc/passwd");

The following explicitly asks to use filtest2() as well:


use FilTest qw(filtest2);

filtest1("/etc/passwd");

filtest2("/etc/passwd");

The following imports neither subroutine name to the current namespace:


use FilTest ();

Even when the null list is used to avoid importing any names, the subroutines in the module are still accessible via an explicit reference to the FilTest package in the subroutine name. The following example illustrates how to access the subroutines directly:


use FilTest ();

FilTest::filtest1("/etc/passwd");

FilTest::filtest2("/etc/passwd");

From Here

This chapter forms one part of the reference section of this book. The chapter attempts to describe all the features of the language in a way that can serve as an easy reference. You can see the other reference chapters as forming one unit with this chapter. You also may want to refer to the portion of Appendix B, "Perl Web Reference," that deals with other nonstandard Perl modules-in particular, the CGI module.

The other chapters that comprise the reference section are: