Perl Mini-Tutorial
Written 2/2007 by Wayne Pollock,
Perl was
invented by Larry Wall to solve some Unix scripting
problems. Other methods involve learning
a variety filter commands, some quite complex (such as awk and sed),
and learn how to “glue” these pieces together with shell constructs. This is difficult, and passing the values
from one part of the script to another often involve
complex quoting or named pipes or temporary files.
Perl was designed as a single scripting language that
combined all the features (and then some) of other filters and the shell, into
a single scripting language. Now you only need to know a single filter
command. While complex, Perl is
forgiving of style. The motto is “there is more than one way to do it in
Perl.”
In
additional to the powerful built-in string, regular expression, and file
processing capabilities in Perl, it is extensible with modules. A vast number have been written and tested,
and are available through the Comprehensive Perl Archive Network
(CPAN), (discussed below).
Perl is
so adept at these tasks it had become the standard
scripting language for CGI programming (for websites). Perl regular expression parsing is second to
none, and is often used in other languages (referred to as Perl compatible REs).
Fortunately
you don’t have to learn all of Perl to create very useful “one liners”. Perl is fully documented in a variety of
formats including man pages (see perltoc, perlintro, perlretut, and perlfaq) and perldoc -f func
Mention
Python and Ruby (show demos).
The following (very) brief intro to Perl is adapted from How to Set Up and Maintain a Web Site
2nd edition, by Lincoln D. Stein, (C)1997 Addison
Wesley, pages 469-472.
Perl
supports three basic kinds of variables:
simple variables known as scalars, array variables (which
are lists of values), and hashes
(also called associative arrays). The names of variables start with a character
to indicate their type: $scalar, @array, and %hash. Variables are
automatically initialized.
When
referring to elements of arrays and hashes the leading character indicates the
type of the element: $ary[1] and $hash{'foo'}. Notice how
Perl uses square braces to index into an array, and curly braces to return a
value from a hash.
Perl
scripts allow blank lines and comments that start with (a word starting with) a
“#” and continues through the end of that line.
Like awk strings and numbers are converted back and forth as
needed.
Strings in single quotes are taken literally, while
with double quotes the string is scanned for variables and escape-sequences (e.g., “\n”) which get
replaced with their values.
In Perl
all statements end with a semicolon. So
a simple (first) Perl script:
#!/usr/bin/perl -Tw
print
"Hello, World!\n";
The
options above enable extra checks (“Taint mode”) and warnings.
Functions
such as print can have parenthesis around the argument list, but
that is optional. So:
print( "Hello, World!\n" );
and: print "Hello,
World!\n";
also: $msg = "Hello,
World!\n"; print $msg;
(Using
single quotes would print the backslash-n literally.)
Arrays hold ordered lists of values, using a
zero-based index:
@stooges = ( 'moe', 'larry',
'curly' );
print "@stooges"; # print @stooges; omits sep.s
print
$stooges[0], "\n";
($moe, $larry) = @stooges;
Hashes hold unordered lists of values, each indexed by
a string key:
%partner = ( "Laurel",
"Hardy", "Abbot", "Costello" );
%partner = ("Laurel" => "Hardy",
"Abbot"=>"Costello");
$partner{"Adam"} = "Eve";
print "$partner{'Abbot'}\n";
print keys(%partner), "\n";
Perl
removes redundant parentheses, so the following are equivalent:
@list = ( 'a', ('b', 'c'),
'd' );
@list
= ( 'a', 'b', 'c', 'd' );
To
generate arrays of arrays, you need to store a reference to the sub-array.
These are generated by using square braces instead of parenthesis:
@list = ( 'a', ['b', 'c'],
'd' );
(print "@list\n"; shows a and d, but a reference
to a list, not “b c”! To dereference
a reference use curly braces around the reference, like this:
print "$list[0] | @{$list[1]} | $list[2]\n";
Beside
the usual math operators (including “**”) Perl uses
a period for string concatenation: "a" . "b" and an x for
repetition: 'a' x 3 (=”aaa”).
You can
define a range in Perl with: @range = (1
.. 10); or for $i (1 .. 10).
For logical comparisons Perl uses standard ops for
numeric comparisons (“==”, “!=”, “<”, etc.) but the following for string comparisons: eq ne lt le gt
ge cmp. (the $a cmp $b operator and the
numerical equivalent of $a <=> $b returns ‑1, 0, or +1
for $b greater than $a, equal to,
or $a grater than $b. You can also test files with: ‑e file (exists), -r file (readable), ‑d file (directory), and others.
To test an expression for true or false, an expression is
converted to a string. Then if "0" or "" it is false, otherwise it is true (so 0.0 which
converts to "0" is false but "0.0"
is true).
Like awk Perl breaks up input lines into fields you can play
with or test. The current line is put into $_. You must request
Perl to break the line into fields by running the split function. With
no arguments this will split the current line into fields that are separated by
white-space. The fields are then put
into the array @_. So to print
the second field of each input line (the “-n” means run
for each line):
cat file | perl -ne 'split;
print "$_[1]\n";'
Loops (while, until for, foreach)
foreach $i
(1..5) {print "$i ";} // Uses
$_ if no $i
while (expr)
{...} until
(expr)
{...}
for (init; test; incr)
{...} // can use for instead of foreach
if (expr) { statement...}
else, elsif, last (=break)
and next (=continue).
Use statment if (cond);
or statement unless (cond);
or statement while (expr);
Input: <> means read one line from stdin,
returns 0 on EOF. To read from a file
use:
open(NAME, "filename") or die ("msg: $!\n");
$line = <NAME>; while ( <NAME> ) { ... }
(Note
input goes into $_ if you don’t put it elsewhere. Some common idioms:
while (<>) { # reads a line
into $_
print; #
prints $_
}
if (/foo/) # means if $_
matches /foo/, a.k.a. if ( $_ =~ m/foo/)
# a.k.a. if ( m/foo/ )
“=~” means “bind
to”. So value =~ m/foo/ means match value against /foo/. Also used
with s// (substitution operator), tr/// (translate operator), and others. Returns true (1) or false
(0) if matched. So:
$bar =~ s/a/b/ # changes a to b in $bar; returns 1 if any
change made.
A key
benefit of Perl in shell scripts is the powerful regular expression
language. Perl has some command line
arguments that wrap a one-liner in one or another type of loop, allowing Perl
to operate just like sed or awk.
Command Line
Options
She-bang: #!/usr/bin/env perl or #!/usr/bin/perl
-Tw
-c check the script for syntax errors
-e 'script' repeat for multiple scripts on one cmd
line.
-i[ext] process
input (“<>”) in place by renaming the input by adding .ext and redirecting the output. If no .ext
than original isn’t saved.
-n Puts
a loop around the script: LINE: while(<>){script}
(Can use LINE in next and last.) This is much
like sed -n.
-p Similar
to -n, this put makes Perl act like sed (process then print each line). This is the
same as the above loop, plus: continue{print or die "-p
destination:$!\n"}
-T Force
taint checks even if not running suid/sgid (which does -T by
default). This makes sure no
un-processed user input can be used in dangerous ways. (Very useful for CGI!)
-w Turns on several useful warnings.
-W Turns
on all possible warnings.
Using CPAN
Run cpan (or “perl -MCPAN -e shell”) once to configure
it interactively. The defaults are
usually good enough. To re-configure run
the cpan command “o conf init”. Cpan
is best run as root, so installed stuff can be automatically put into the
correct places.
The cpan command can be run interactively (then you say “install foo” or “make foo”),
or just run cpan moduleName. To install
the latest version of cpan run ”cpan CPAN”. To be able
to validate downloads, run as root “cpan Digest::SHA Module::Signature”.
Qu: I am not root, how can I install a
module in a personal directory? Ans: You need to
use your own configuration, not the one for the root user. CPAN’s configuration
script uses the system config (set by root) for all
defaults, saving your choices to your ~/.cpan/CPAN/MyConfig.pm file. (Show.)
You can
also manually initiate this process with the following command: perl ‑MCPAN -e 'mkmyconfig' or by running “mkmyconfig” from the CPAN shell, or even using “o conf init”. You will need
to configure the makepl_arg
setting to install stuff in your home dir, something like this:
o conf makepl_arg "PREFIX=~/perl"
or:
o conf makepl_arg "LIB=~/perl/lib \
INSTALLMAN1DIR=~/man/man1 \
INSTALLMAN3DIR=~/man/man3"
(Don’t
forget to create these directories.) If
you change individual settings with o conf, you make those settings permanent (like all “o conf” settings) with “o conf commit”. You will also
have to add ~/man to the MANPATH
environment variable and tell your Perl programs to look into ~/perl/lib, by including the following at the top of your Perl
scripts:
use lib "$ENV{HOME}/perl/lib";
or by setting
the PERL5LIB environment variable.
Examples
Search and
replaces strings in many files:
perl -pi -e 's/text1/text2/g;' *.ext
or:
perl -pi.bak
-e 's/text1/text2/g;' *.ext
find / search_criteria | xargs \
perl -pi -e
's/text1/text2/g'
Show fix-style.pl.
Show Perl/Tk hellotk.pl (demo
via Knoppix, or install Windows Perl & Tk)
Show urldecode.
Show url2html.
Sending email with Perl:
cpan Net::Cmd;
cpan Net::Config; cpan Net::SMTP
Then show mail.pl.