This is an old document. Its content has been incorporated into the newer Perl lessons.
Later I've found Introduction to Perl by Greg Johnson. It was so good that I created Perl Lessons which is essentially that introduction and this introduction of mine merged together, modified, and updated; there is also an experimental version with frames. (I still keep this document available, since some people may prefer a short introduction.)
Elementary acquaintance with using Unix system is assumed. Knowing the C programming language is definitely an advantage, but I have tried to avoid assuming such knowledge.
This text is mostly based on an Internet-accessible hypertext documentation of Perl version 5. My contribution is an attempt to pick up the most essential things and to provide simple example Perl scripts.
Programs written in Perl are called
Perl scripts, whereas the term
the perl program refers to the system program
named perl
for executing Perl scripts. (What, confused already?)
If you have used shell scripts or awk
or sed
or similar utilities
for various purposes,
you will find that you can normally use Perl for those and many other purposes,
and the code tends to be more compact.
And if you haven't used such utilities but have started thinking you might
have need for them, then perlhaps what you really need to learn is Perl
instead of all kinds of futilities.
Perl is implemented as an interpreted (not compiled) language. Thus, the execution of a Perl script tends to use more CPU time than a corresponding C program, for instance. On the other hand, computers tend to get faster and faster, and writing something in Perl instead of C tends to save your time.
Hello world!
The reason is that in order to be able to do this, you need to know
quite a lot of simple things, and language manuals often omit such
details.
So let us get started:
lk-hp-23 perl 195 % cat >hello #!/usr/bin/perl print "Hello world!\n"; lk-hp-23 perl 196 % chmod a+x hello lk-hp-23 perl 197 % ./hello Hello world! lk-hp-23 perl 198 %Explanations:
cat
command to create a file
named hello
and containg a very simple Perl
script. Normally one uses one's favourite editor (such as
Emacs) to create Perl scripts, of course.
#!/usr/bin/perl
,
specifies that the script is to be executed by the
perl
program (and not by a shell, for example).
Consider that line as an obligatory prelude. The part
/usr/bin/perl
is the full name (path name) of
the perl
program. In different installations,
different names may be in use, but this name is a typical one.
\n
which stands for newline.
In Perl strings, control characters can be represented in this
way, using the backslash character \
and a letter.
This is the same convention which is used in the C programming
language.
chmod
command
to give execute access to the file containing the script.
In Unix, files are usually created without execute access,
so that access must be granted separately. In this case,
all users get execute access
to the file.
./
means that I refer to the named
file in my current working directory.
(Possibly just a simple name like
hello
could work, but it might also cause problems,
if the name happens to be the same as the name of a system command.)
Variables are not declared at all. You can simply start using a variable.
An attempt to use an uninitialized variable causes a zero or an
empty string or the truth value false (depending on the context)
to be used.
However, using the command-line switch -w
you can ask the Perl interpreter to issue warnings, such as
reporting uses of undefined values.
Perl has three data structures: scalars, arrays of scalars, and associative arrays of scalars, known as "hashes".
Scalar variable names always begin with the dollar sign, eg
$myvar
.
Names for arrays and array slices always begin with the commercial at sign,
eg @myarray
Names for hashes always begin with the percent sign, eg %myhash
.
Let us also mention that subroutine names begin with the ampersand sign,
eg &mysub
, although this sign can often be omitted.
Intuitively, the special characters mentioned above correspond to English
words as follows: you can read
$
as the,
@
as these or those,
%
as these or those, and
&
as do.
The case of letters is significant in variable names (as in Unix
commands and in the C language), eg $foo
and
$Foo
are distinct variables.
If you have an array, eg @myarr
, you can form indexed
variables by appending an index in brackets (as in the C language)
but changing @
into $
, eg
$myarr[5]
. The reason for the change is that the indexed
variable is a scalar.
You can also form array slices, for example
@myvar[5..10]
, which is an array (therefore, denoted
using @
) consisting of those components of
@myvar
which have an index between 5 and 10, inclusively.
Array indexes are integer numbers, starting with 0 (as in C, but unlike in many other languages).
Hashes, on the other hand, can be indexed eg with strings, since
indexing method is different. For hashes, the index is in braces,
eg $myhash{'foobar'}
. Notice that in this case, too,
the indexed variable is a scalar and therefore begins with
$
.
Every variable kind (scalar, array, hash)
has its own namespace.
This
means that $foo
and @foo
are two different variables. It also means that
$foo[1]
is a part of
@foo
,
not a part of $foo
. This may seem a bit weird,
but that's okay, because it is weird.
Notice, in particular, that there are two important predefined
variables $_
and @_
and that you should
realize that eg $_[2]
is a component of @_
.
The value of an array variable is effectively an ordered list of values. In Perl, you can also construct lists as data objects using a constructor in which the values are listed within parentheses, separated with blanks, eg
(2, 3, 7, 42)A list can, in particular, be assigned to an array variable, eg
@foo = (2, 3, 7, 42);Lists are important in Perl, since many operations yield lists as their result.
The script prints out its input so that each line begins with a running line number:
#!/usr/bin/perl $line = 1; while (<>) { print $line, " ", $_; $line = $line + 1; }The scalar variable
$line
is of course the line counter.
It is initialized to 1 in the beginning, and it is incremented by 1 within
a loop which processes each input line at a time.
The loop construct is of the form
while (<>) {
process one line of input }
and although it looks cryptic at first sight, it is really very
convenient to use. You need not worry about actual input operations;
just use the construct shown above, and use the predefined variable
$_
to refer to the input line.
The print
statement in our example contains three arguments,
one for getting the lines number printed, one for getting a blank printed,
and one for getting the input line printed. We do not have an argument
for getting a newline printed, since the value of the special variable
$_
contains a trailing newline.
In fact, you could make your code even shorter: you could write the script as
#!/usr/bin/perl $line = 1; while (<>) { print $line++, " ", $_; }Here the statement contains
$line++
instead of just
$line
,
since in Perl (as in C) you can increment a variable (after its old
value has been used) by appending the operator ++
to it.
You might wish to have the line numbers right-adjusted, eg each in a
fixed field of five characters, padded with blanks on the left.
This would be pretty easy, if you know the C language output formatting tools.
You could just replace the print
statement with
printf "%5d %s", $line++, $_;
Normally you want your script to read input from a file.
Simply write the name of the file as a command-line argument, ie
when giving
the script name as a command. Thus, for example, if you had written
our simple script (the simpler version of it) into a file named
lines
, you could test it by using it as its own test data
(confusing?) as follows:
lk-hp-23 perl 251 % ./lines lines 1 #!/usr/bin/perl 2 $line = 1; 3 while (<>) { 4 print $line++, " ", $_; } lk-hp-23 perl 252 %
You can also write several file names as command-line arguments, eg
lines foo bar zapwhich would mean that the script
lines
takes as input
the contents of files foo
, bar
, and zap
as if you had concatenated the contents into a single file and given its name
as argument.
For instance, the statement
split;first splits the current input line into blank-separated fields and then assigns the fields to components of the predefined array variable
@_
.
You can then access the fields using indexed variables.
The special variable $#_
contains information about the number of
fields: the value of that variable is the number of fields minus one.
(More generally, for any array variable @
a,
the variable $#
a contains the last index
of the array.)
Assume, for example, that you have some data where each line consists of blank-separated items (which might be strings or numbers) and you want to write a Perl script which picks up the second item from each line. (Such filtering is often needed to extract useful information from a large data file.) This is simple:
#!/usr/bin/perl while (<>) { split; print $_[1], "\n"; }Notice that you must use an index value of 1 to get the 2nd field, since array indexing begins at 0 in Perl.
if
statements for branching
and
while
statements for looping.
Within control structures you specify the actions to be conditionally or repeatedly executed as blocks. A block is simply a sequence of statements surrounded by braces. Notice that braces are always required (unlike in C).
The simplest if
statement is of the form
if(
expression)
block
which means that the expression is evaluated, and if the result is true, the block is executed.
For example, the statement if($i < 10) {$j = 100;}
sets the value of $j
to 100 if the value of
$i
is less than 10. As mentioned above, braces are required
(even if there is a single statement within them), and the parentheses
around the condition expression are obligatory, too.
A two-branch if
statement is of the form
if(
expression)
block1 else
block2
which means that the expression is evaluated, and if the result is true, block1 is executed, otherwise block2 is executed.
The while
statement is of the form
while(
expression)
block
which means that the expression is evaluated, and if the result is true, the block is executed, then the expression is re-evaluated and the process is repeated until the expression evaluates to false.
As a simple example of using
the while
statement is the following script, which
splits input lines into fields (in a manner described above) and
prints out the fields in reverse order.
#!/usr/bin/perl while (<>) { split; $i = $#_; while($i >= 0) { print $_[$i--], " "; } print "\n"; }The control in the (inner)
while
loop is based on
using an auxiliary variable $i
, which is initialized
to the index of the last field and decremented (using the C-style
embedded decrement operator --
) within the loop until
it reaches zero, ie all fields have been processed.
The operator >=
has the obvious meaning
'is greater than or equal to'.
tr /A-Z/a-z/;This can be read as follows: "translate all characters in the range from A to Z to the corresponding characters in the range from a to z".
The operation is applied to the value of $_
, ie the current
input line. If you would like it to be applied to the value of a
variable $foo
, you should write
$foo =~ tr /A-Z/a-z/;Thus, the syntax is odd-looking, but once you get accustomed to it, the Perl string manipulation tools are easy to use.
.for
)
are renamed so that the suffix is changed (eg to .f
).
In some operating systems this is easy, but in normal Unix
command interpreters there is no direct way to do it.
(A naive user might try mv *.for *.f
but it does not
work at all in the way you would like.)
No problem, it's easily done in Perl, for example as follows:
#!/usr/bin/perl while(<*.for>) { $oldname = $_; s/\.for$/\.f/; rename $oldname, $_; }A previous version of this document had in this example
s/.for/.f/;
instead of
s/\.for$/\.f/;
.
Although the simpler version works in most cases,
it is buggy, because the symbol . stands for
any character, not just the period, and
because there is no requirement that the string .for
must appear at the end of the file name only.
Thus, the code would rename eg zapfor.for
to za.f.for
.
To refer to
the period character, one
must use
"escape" notation by prefixing it with a backslash.
Moreover, if the trailing $
(denoting end of line) is
omitted,
the code would
apply to the first appearance of .for
in the filename.
The while
statement is different from what we have seen before.
It means that all file names matching the pattern within the angle
brackets (here *.for
) are processed and assigned, each in turn,
to the variable $_
. In fact, the meaning of $_
is not simply 'the current input line' as told before but more generally
'the current data being processed', and the context defines in each case
what this exactly means.
Within the loop, the file name is copied to variable $oldname
and then modified using a construct which performs a
substitution and which resembles the tr
construct used in the preceding example.
Finally, the rename operation is performed using a Perl built-in function,
rename
, which takes two file names as arguments.
Alternatively, we could also use the following:
system "mv $oldname $_";which does the same operation (less efficiently) by asking the Unix system to execute a system command.
As the next step, you could read above-mentioned Gary Major's Introduction to Perl, which contains more information about Perl operators, control structures, etc. It's very useful as a compact reference. There is also the course Introduction to Perl or, Learn Perl in Two Hours by MU Campus Computing.
See also Bary B. Floyd's Take 10 Minutes to Learn Perl, which is a set of annotated sample scripts.
To learn even more, check the Perl pages at Galaxy which contain a lot of links to information about Perl.
The Perl Language Home Page contains, in addition to other valuable information, the Perl FAQ.
An experienced Perl programmer who occasionally cannot remember the name of a function or its exact syntax might find it convenient to consult an HTMLified version of Johan Vromans' Perl 5 Desktop Reference .
Morover, there is CPAN, the Comprehensive Perl Archive Network, which aims to be the Perl archive. One of the access points to CPAN is ftp://ftp.funet.fi/pub/languages/perl/CPAN/CPAN.html The CPAN archive contains, among many other things, an extensive online Perl manual.
See also: Middle of Nowhere Perl Pages .
Jukka Korpela Last update: 1998-01-29