You are viewing a plain text version of this content. The canonical link for it is here.
Posted to docs-cvs@perl.apache.org by st...@apache.org on 2002/07/31 16:43:17 UTC
cvs commit: modperl-docs/src/docs/general/perl_reference perl_reference.pod
stas 2002/07/31 07:43:17
Added: src/docs/general/hardware hardware.pod
src/docs/general/multiuser multiuser.pod
src/docs/general/perl_myth perl_myth.pod
src/docs/general/perl_reference perl_reference.pod
Log:
give pods their own dirs
Revision Changes Path
1.1 modperl-docs/src/docs/general/hardware/hardware.pod
Index: hardware.pod
===================================================================
=head1 NAME
Choosing an Operating System and Hardware
=head1 Description
Before you use the techniques documented on this site to tune servers
and write code you need to consider the demands which will be placed on
the hardware and the operating system. There is no point in investing
a lot of time and money in configuration and coding only to find that
your server's performance is poor because you did not choose a
suitable platform in the first place.
While the tips below could apply to many web servers, they are aimed
primarily at administrators of mod_perl enabled Apache server.
Because hardware platforms and operating systems are developing
rapidly (even while you are reading this document), this discussion must
be in general terms.
=head1 Choosing an Operating System
First let's talk about Operating Systems (OSs).
Most of the time I prefer to use Linux or something from the *BSD
family. Although I am personally a Linux devotee, I do not want to
start yet another OS war.
I will try to talk about what characteristics and features you should
be looking for to support an Apache/mod_perl server, then when you
know what you want from your OS, you can go out and find it. Visit
the Web sites of the operating systems you are interested in. You can
gauge user's opinions by searching the relevant discussions in
newsgroups and mailing list archives. Deja - http://deja.com and
eGroups - http://egroups.com are good examples. I will leave this fan
research to the reader.
=head2 Stability and Robustness
Probably the most important features in an OS are stability and
robustness. You are in an Internet business. You do not keep normal
9am to 5pm working hours like many conventional businesses you know.
You are open 24 hours a day. You cannot afford to be off-line, for
your customers will go shop at another service like yours (unless you
have a monopoly :). If the OS of your choice crashes every day, first
do a little investigation. There might be a simple reason which you
can find and fix. There are OSs which won't work unless you reboot
them twice a day. You don't want to use the OS of this kind, no
matter how good the OS' vendor sales department. Do not follow flushy
advertisements, follow developers advices instead.
Generally, people who have used the OS for some time can tell you a
lot about its stability. Ask them. Try to find people who are doing
similar things to what you are planning to do, they may even be using
the same software. There are often compatibility issues to resolve.
You may need to become familiar with patching and compiling your OS.
It's easy.
=head2 Memory Management
You want an OS with a good memory management, some OSs are well known
as memory hogs. The same code can use twice as much memory on one OS
compared to another. If the size of the mod_perl process is 10Mb and
you have tens of these running, it definitely adds up!
=head2 Memory Leaks
Some OSs and/or their libraries (e.g. C runtime libraries) suffer from
memory leaks. A leak is when some process requests a chunk of memory
for temporary storage, but then does not subsequently release it. The
chunk of memory is not then available for any purpose until the
process which requested it dies. We cannot afford such leaks. A
single mod_perl process sometimes serves thousands of requests before
it terminates. So if a leak occurs on every request, the memory
demands could become huge. Of course our code can be the cause of the
memory leaks as well (check out the C<Apache::Leak> module on CPAN).
Certainly, we can reduce the number of requests to be served over the
process' life, but that can degrade performance.
=head2 Sharing Memory
We want an OS with good memory sharing capabilities. As we have seen,
if we preload the modules and scripts at server startup, they are
shared between the spawned children (at least for a part of a process'
life - memory pages can become "dirty" and cease to be shared). This
feature can reduce memory consumption a lot!
=head2 Cost and Support
If we are in a big business we probably do not mind paying another
$1000 for some fancy OS with bundled support. But if our resources
are low, we will look for cheaper and free OSs. Free does not mean
bad, it can be quite the opposite. Free OSs can have the best support
we can find. Some do. It is very easy to understand - most of the
people are not rich and will try to use a cheaper or free OS first if
it does the work for them. Since it really fits their needs, many
people keep using it and eventually know it well enough to be able to
provide support for others in trouble. Why would they do this for
free? One reason is for the spirit of the first days of the Internet,
when there was no commercial Internet and people helped each other,
because someone helped them in first place. I was there, I was
touched by that spirit and I am keen to keep that spirit alive.
But, let's get back to our world. We are living in material world,
and our bosses pay us to keep the systems running. So if you feel
that you cannot provide the support yourself and you do not trust the
available free resources, you must pay for an OS backed by a company,
and blame them for any problem. Your boss wants to be able to sue
someone if the project has a problem caused by the external product
that is being used in the project. If you buy a product and the
company selling it claims support, you have someone to sue or at least
to put the blame on.
If we go with Open Source and it fails we do not have someone to
sue... wrong--in the last years many companies have realized how good
the Open Source products are and started to provide an official
support for these products. So your boss cannot just dismiss your
suggestion of using an Open Source Operating System. You can get a
paid support just like with any other commercial OS vendor.
Also remember that the less money you spend on OS and Software, the
more you will be able to spend on faster and stronger hardware.
=head2 Discontinued Products
The OSs in this hazard group tend to be developed by a single company
or organization.
You might find yourself in a position where you have invested a lot of
time and money into developing some proprietary software that is
bundled with the OS you chose (say writing a mod_perl handler which
takes advantage of some proprietary features of the OS and which will
not run on any other OS). Things are under control, the performance
is great and you sing with happiness on your way to work. Then, one
day, the company which supplies your beloved OS goes bankrupt (not
unlikely nowadays), or they produce a newer incompatible version and
they will not support the old one (happens all the time). You are
stuck with their early masterpiece, no support and no source code!
What are you going to do? Invest more money into porting the software
to another OS...
Everyone can be hit by this mini-disaster so it is better to check the
background of the company when making your choice. Even so you never
know what will happen tomorrow - in 1980, a company called Tektronix
did something similar to one of the Guide reviewers with its
microprocessor development system. The guy just had to buy another
system. He didn't buy it from Tektronix, of course. The second
system never really worked very well and the firm he bought it from
went bust before they ever got around to fixing it. So in 1982 he
wrote his own microprocessor development system software. It didn't
take long, it works fine, and he's still using it 18 years later.
Free and Open Source OSs are probably less susceptible to this kind of
problem. Development is usually distributed between many companies
and developers, so if a person who developed a really important part
of the kernel lost interest in continuing, someone else will pick the
falling flag and carry on. Of course if tomorrow some better project
shows up, developers might migrate there and finally drop the
development: but in practice people are often given support on older
versions and helped to migrate to current versions. Development tends
to be more incremental than revolutionary, so upgrades are less
traumatic, and there is usually plenty of notice of the forthcoming
changes so that you have time to plan for them.
Of course with the Open Source OSs you can have the source! So you
can always have a go yourself, but do not under-estimate the amounts
of work involved. There are many, many man-years of work in an OS.
=head2 OS Releases
Actively developed OSs generally try to keep pace with the latest
technology developments, and continually optimize the kernel and other
parts of the OS to become better and faster. Nowadays, Internet and
networking in general are the hottest topics for system developers.
Sometimes a simple OS upgrade to the latest stable version can save
you an expensive hardware upgrade. Also, remember that when you buy
new hardware, chances are that the latest software will make the most
of it.
If a new product supports an old one by virtue of backwards
compatibility with previous products of the same family, you might not
reap all the benefits of the new product's features. Perhaps you get
almost the same functionality for much less money if you were to buy
an older model of the same product.
=head1 Choosing Hardware
Sometimes the most expensive machine is not the one which provides the
best performance. Your demands on the platform hardware are based on
many aspects and affect many components. Let's discuss some of them.
In the discussion we use terms that may be unfamiliar to some readers:
=over 4
=item *
Cluster - a group of machines connected together to perform one big or
many small computational tasks in a reasonable time. Clustering can
also be used to provide 'fail-over' where if one machine fails its
processes are transferred to another without interruption of service.
And you may be able to take one of the machines down for maintenance
(or an upgrade) and keep your service running - the main server will
simply not dispatch the requests to the machine that was taken down.
=item *
Load balancing - users are given the name of one of your machines but
perhaps it cannot stand the heavy load. You can use a clustering
approach to distribute the load over a number of machines. The
central server, which users access initially when they type the name
of your service, works as a dispatcher. It just redirects requests to
other machines. Sometimes the central server also collects the
results and returns them to the users. You can get the advantages of
clustering too.
There are many load balancing techniques. (See L<High-Availability
Linux Project|download::third_party/High_Availability_Linux_Project> for more info.)
=item *
NIC - Network Interface Card. A hardware component that allows to
connect your machine to the network. It performs packets sending and
receiving, newer cards can encrypt and decrypt packets and perform
digital signing and verifying of the such. These are coming in
different speeds categories varying from 10Mbps to 10Gbps and
faster. The most used type of the NIC card is the one that implements
the Ethernet networking protocol.
=item *
RAM - Random Access Memory. It's the memory that you have in your
computer. (Comes in units of 8Mb, 16Mb, 64Mb, 256Mb, etc.)
=item *
RAID - Redundant Array of Inexpensive Disks.
An array of physical disks, usually treated by the operating system as
one single disk, and often forced to appear that way by the hardware.
The reason for using RAID is often simply to achieve a high data
transfer rate, but it may also be to get adequate disk capacity or
high reliability. Redundancy means that the system is capable of
continued operation even if a disk fails. There are various types of
RAID array and several different approaches to implementing them.
Some systems provide protection against failure of more than one drive
and some (`hot-swappable') systems allow a drive to be replaced
without even stopping the OS. See for example the Linux `HOWTO'
documents Disk-HOWTO, Module-HOWTO and Parallel-Processing-HOWTO.
=back
=head2 Machine Strength Demands According to Expected Site Traffic
If you are building a fan site and you want to amaze your friends with
a mod_perl guest book, any old 486 machine could do it. If you are in
a serious business, it is very important to build a scalable server.
If your service is successful and becomes popular, the traffic could
double every few days, and you should be ready to add more resources
to keep up with the demand. While we can define the webserver
scalability more precisely, the important thing is to make sure that
you can add more power to your webserver(s) without investing much
additional money in software development (you will need a little
software effort to connect your servers, if you add more of them).
This means that you should choose hardware and OSs that can talk to
other machines and become a part of a cluster.
On the other hand if you prepare for a lot of traffic and buy a
monster to do the work for you, what happens if your service doesn't
prove to be as successful as you thought it would be? Then you've
spent too much money, and meanwhile faster processors and other
hardware components have been released, so you lose.
Wisdom and prophecy, that's all it takes :)
=head3 Single Strong Machine vs Many Weaker Machines
Let's start with a claim that a four years old processor is still very
powerful and can be put to a good use. Now let's say that for a given
amount of money you can probably buy either one new very strong
machine or about ten older but very cheap machines. I claim that with
ten old machines connected into a cluster and by deploying load
balancing you will be able to serve about five times more requests
than with one single new machine.
Why is that? Because generally the performance improvement on a new
machine is marginal while the price is much higher. Ten machines will
do faster disk I/O than one single machine, even if the new disk is
quite a bit faster. Yes, you have more administration overhead, but
there is a chance you will have it anyway, for in a short time the new
machine you have just bought might not stand the load. Then you will
have to purchase more equipment and think about how to implement load
balancing and web server file system distribution anyway.
Why I'm so convinced? Look at the busiest services on the Internet:
search engines, web-email servers and the like -- most of them use a
clustering approach. You may not always notice it, because they hide
the real implementation behind proxy servers.
=head2 Internet Connection
You have the best hardware you can get, but the service is still
crawling. Make sure you have a fast Internet connection. Not as fast
as your ISP claims it to be, but fast as it should be. The ISP might
have a very good connection to the Internet, but put many clients on
the same line. If these are heavy clients, your traffic will have to
share the same line and your throughput will suffer. Think about a
dedicated connection and make sure it is truly dedicated. Don't trust
the ISP, check it!
The idea of having a connection to B<The Internet> is a little
misleading. Many Web hosting and co-location companies have large
amounts of bandwidth, but still have poor connectivity. The public
exchanges, such as MAE-East and MAE-West, frequently become
overloaded, yet many ISPs depend on these exchanges.
Private peering means that providers can exchange traffic much
quicker.
Also, if your Web site is of global interest, check that the ISP has
good global connectivity. If the Web site is going to be visited
mostly by people in a certain country or region, your server should
probably be located there.
Bad connectivity can directly influence your machine's performance.
Here is a story one of the developers told on the mod_perl mailing
list:
What relationship has 10% packet loss on one upstream provider got
to do with machine memory ?
Yes.. a lot. For a nightmare week, the box was located downstream of
a provider who was struggling with some serious bandwidth problems
of his own... people were connecting to the site via this link, and
packet loss was such that retransmits and tcp stalls were keeping
httpd heavies around for much longer than normal.. instead of
blasting out the data at high or even modem speeds, they would be
stuck at 1k/sec or stalled out... people would press stop and
refresh, httpds would take 300 seconds to timeout on writes to
no-one.. it was a nightmare. Those problems didn't go away till I
moved the box to a place closer to some decent backbones.
Note that with a proxy, this only keeps a lightweight httpd tied up,
assuming the page is small enough to fit in the buffers. If you are
a busy internet site you always have some slow clients. This is a
difficult thing to simulate in benchmark testing, though.
=head2 I/O Performance
If your service is I/O bound (does a lot of read/write operations to
disk) you need a very fast disk, especially if the you need a
relational database, which are the main I/O stream creators. So you
should not spend the money on Video card and monitor! A cheap card
and a 14" monochrome monitor are perfectly adequate for a Web server,
you will probably access it by C<telnet> or C<ssh> most of the time.
Look for disks with the best price/performance ratio. Of course, ask
around and avoid disks that have a reputation for headcrashes and
other disasters.
You must think about RAID or similar systems if you have an enormous
data set to serve (what is an enormous data set nowadays? Gigabytes,
Terabytes?) or you expect a really big web traffic.
Ok, you have a fast disk, what's next? You need a fast disk
controller. There may be one embedded on your computer's motherboard.
If the controller is not fast enough you should buy a faster one.
Don't forget that it may be necessary to disable the original
controller.
=head2 Memory
Memory should be well tested. Many memory test programs are
practically useless. Running a busy system for a few weeks without
ever shutting it down is a pretty good memory test. If you increase
the amount of RAM on a well-tested box, use well-tested RAM.
How much RAM do you need? Nowadays, the chances are that you will
hear: "Memory is cheap, the more you buy the better". But how much is
enough? The answer is pretty straightforward: I<you do not want your
machine to swap>. When the CPU needs to write something into memory,
but memory is already full, it takes the least frequently used memory
pages and swaps them out to disk. This means you have to bear the
time penalty of writing the data to disk. If another process then
references some of the data which happens to be on one of the pages
that has just been swapped out, the CPU swaps it back in again,
probably swapping out some other data that will be needed very shortly
by some other process. Carried to the extreme, the CPU and disk start
to I<thrash> hopelessly in circles, without getting any real work
done. The less RAM there is, the more often this scenario arises.
Worse, you can exhaust swap space as well, and then your troubles
really start...
How do you make a decision? You know the highest rate at which your
server expects to serve pages and how long it takes on average to
serve one. Now you can calculate how many server processes you need.
If you know the maximum size your servers can grow to, you know how
much memory you need. If your OS supports L<memory
sharing|general::hardware::hardware/Sharing_Memory>, you can make best use of this
feature by preloading the modules and scripts at server startup, and
so you will need less memory than you have calculated.
Do not forget that other essential system processes need memory as
well, so you should plan not only for the Web server, but also take
into account the other players. Remember that requests can be queued,
so you can afford to let your client wait for a few moments until a
server is available to serve it. Most of the time your server will
not have the maximum load, but you should be ready to bear the peaks.
You need to reserve at least 20% of free memory for peak situations.
Many sites have crashed a few moments after a big scoop about them was
posted and an unexpected number of requests suddenly came in. (This
is called the Slashdot effect, which was born at http://slashdot.org ).
If you are about to announce something cool, be aware of the possible
consequences.
=head2 CPU
Make sure that the CPU is operating within its specifications. Many
boxes are shipped with incorrect settings for CPU clock speed, power
supply voltage etc. Sometimes a cooling fan is not fitted. It may be
ineffective because a cable assembly fouls the fan blades. Like
faulty RAM, an overheating processor can cause all kinds of strange
and unpredictable things to happen. Some CPUs are known to have bugs
which can be serious in certain circumstances. Try not to get one of
them.
=head2 Bottlenecks
You might use the most expensive components, but still get bad
performance. Why? Let me introduce an annoying word: bottleneck.
A machine is an aggregate of many components. Almost any one of them
may become a bottleneck.
If you have a fast processor but a small amount of RAM, the RAM will
probably be the bottleneck. The processor will be under-utilized,
usually it will be waiting for the kernel to swap the memory pages in
and out, because memory is too small to hold the busiest pages.
If you have a lot of memory, a fast processor, a fast disk, but a slow
disk controller, the disk controller will be the bottleneck. The
performance will still be bad, and you will have wasted money.
Use a fast NIC that does not create a bottleneck. They are cheap. If
the NIC is slow, the whole service is slow. This is a most important
component, since webservers are much more often network-bound than
they are disk-bound!
=head3 Solving Hardware Requirement Conflicts
It may happen that the combination of software components which you
find yourself using gives rise to conflicting requirements for the
optimization of tuning parameters. If you can separate the components
onto different machines you may find that this approach (a kind of
clustering) solves the problem, at much less cost than buying faster
hardware, because you can tune the machines individually to suit the
tasks they should perform.
For example if you need to run a relational database engine and
mod_perl server, it can be wise to put the two on different machines,
since while RDBMS need a very fast disk, mod_perl processes need lots
of memory. So by placing the two on different machines it's easy to
optimize each machine at separate and satisfy the each software
components requirements in the best way.
=head2 Conclusion
To use your money optimally you have to understand the hardware very
well, so you will know what to pick. Otherwise, you should hire a
knowledgeable hardware consultant and employ them on a regular basis,
since your needs will probably change as time goes by and your
hardware will likewise be forced to adapt as well.
=head1 Maintainers
Maintainer is the person(s) you should contact with updates,
corrections and patches.
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=back
=head1 Authors
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=back
Only the major authors are listed above. For contributors see the
Changes file.
=cut
1.1 modperl-docs/src/docs/general/multiuser/multiuser.pod
Index: multiuser.pod
===================================================================
=head1 NAME
mod_perl for ISPs. mod_perl and Virtual Hosts
=head1 Description
mod_perl hosting by ISPs: fantasy or reality? This section covers some
topics that might be of interest to users looking for ISPs to host
their mod_perl-based website, and ISPs looking for a way to provide
such services.
Today, it is a reality: there are a number of ISPs hosting mod_perl,
although the number of these is not as big as we would have liked it
to be. To see a list of ISPs that can provide mod_perl hosting, see
L<ISPs supporting mod_perl|help::isps>.
=head1 ISPs providing mod_perl services - a fantasy or a reality
=over 4
=item *
You installed mod_perl on your box at home, and you fell in love with
it. So now you want to convert your CGI scripts (which currently are
running on your favorite ISPs machine) to run under mod_perl. Then
you discover that your ISP has never heard of mod_perl, or he refuses
to install it for you.
=item *
You are an old sailor in the ISP business, you have seen it all, you
know how many ISPs are out there and you know that the sales margins
are too low to keep you happy. You are looking for some new service
almost no one else provides, to attract more clients to become your
users and hopefully to have a bigger slice of the action than your
competitors.
=back
If you are a user asking for a mod_perl service or an ISP considering
to provide this service, this section should make things clear for
both of you.
An ISP has three choices:
=over 4
=item 1
ISPs probably cannot let users run scripts under mod_perl on the main
server. There are many reasons for this:
Scripts might leak memory, due to sloppy programming. There will not
be enough memory to run as many servers as required, and clients will
be not satisfied with the service because it will be slower.
The question of file permissions is a very important issue: any user
who is allowed to write and run a CGI script can at least read (if not
write) any other files that belong to the same user and/or group the
web server is running as. Note that L<it's impossible to run
C<suEXEC> and C<cgiwrap> extensions under
mod_perl 1.0|guide::install/Is_it_possible_to_run_mod_perl_enabled_Apache_as_suExec_>.
Another issue is the security of the database connections. If you use
C<Apache::DBI>, by hacking the C<Apache::DBI> code you can pick a
connection from the pool of cached connections even if it was opened
by someone else and your scripts are running on the same web server.
Yet another security issue is a potential compromise of the systems
via user's code running on the webservers. One of the possible
solutions here is to use chroot(1) or jail(8) mechanisms which allow
to run subsystems isolated from the main system. So if a subsystem
gets compromised the whole system is still safe.
There are many more things to be aware of so at this time you have to
say I<No>.
Of course as an ISP you can run mod_perl internally, without allowing
your users to map their scripts so that they will run under mod_perl.
If as a part of your service you provide scripts such as guest books,
counters etc. which are not available for user modification, you can
still can have these scripts running very fast.
=item 2
But, hey why can't I let my users run their own servers, so I can wash
my hands of them and don't have to worry about how dirty and sloppy
their code is (assuming that the users are running their servers under
their own usernames, to prevent them from stealing code and data from
each other).
This option is fine as long as you are not concerned about your new
systems resource requirements. If you have even very limited
experience with mod_perl, you know that mod_perl enabled Apache
servers while freeing up your CPU and allowing you to run scripts very
much faster, have huge memory demands (5-20 times that of plain
Apache).
The size depends on the code length, the sloppiness of the
programming, possible memory leaks the code might have and all that
multiplied by the number of children each server spawns. A very
simple example: a server, serving an average number of scripts,
demanding 10Mb of memory which spawns 10 children, already raises your
memory requirements by 100Mb (the real requirement is actually much
smaller if your OS allows code sharing between processes and
programmers exploit these features in their code). Now multiply the
average required size by the number of server users you intend to have
and you will get the total memory requirement.
Since ISPs never say I<No>, you'd better take the inverse approach -
think of the largest memory size you can afford then divide it by one
user's requirements as I have shown in this example, and you will know
how many mod_perl users you can afford :)
But you cannot tell how much memory your users may use? Their
requirements from a single server can be very modest, but do you know
how many servers they will run? After all, they have full control of
I<httpd.conf> - and it has to be this way, since this is essential for
the user running mod_perl.
All this rumbling about memory leads to a single question: is it
possible to prevent users from using more than X memory? Or another
variation of the question: assuming you have as much memory as you
want, can you charge users for their average memory usage?
If the answer to either of the above questions is I<Yes>, you are all
set and your clients will prize your name for letting them run
mod_perl! There are tools to restrict resource usage (see for example
the man pages for C<ulimit(3)>, C<getrlimit(2)>, C<setrlimit(2)> and
C<sysconf(3)>, the last three have the corresponding Perl modules:
C<BSD::Resource> and C<Apache::Resource>).
[ReaderMETA]: If you have experience with other resource limiting
techniques please share it with us. Thank you!
If you have chosen this option, you have to provide your client with:
=over 4
=item *
Shutdown and startup scripts installed together with the rest of your
daemon startup scripts (e.g I</etc/rc.d> directory), so that when you
reboot your machine the user's server will be correctly shutdown and
will be back online the moment your system starts up. Also make sure
to start each server under the username the server belongs to, or you
are going to be in big trouble!
=item *
Proxy services (in forward or httpd accelerator mode) for the user's
virtual host. Since the user will have to run their server on an
unprivileged port (E<gt>1024), you will have to forward all requests
from C<user.given.virtual.hostname:80> (which is
C<user.given.virtual.hostname> without the default port 80) to
C<your.machine.ip:port_assigned_to_user> . You will also have to tell
the users to code their scripts so that any self referencing URLs are
of the form C<user.given.virtual.hostname>.
Letting the user run a mod_perl server immediately adds a requirement
for the user to be able to restart and configure their own server.
Only root can bind to port 80, this is why your users have to use port
numbers greater than 1024.
Another solution would be to use a setuid startup script, but think
twice before you go with it, since if users can modify the scripts
they will get a root access. For more information refer to the section
"L<SUID Start-up Scripts|general::control::control/SUID_Start_up_Scripts>".
=item *
Another problem you will have to solve is how to assign ports between
users. Since users can pick any port above 1024 to run their server,
you will have to lay down some rules here so that multiple servers do
not conflict.
A simple example will demonstrate the importance of this problem: I am
a malicious user or I am just a rival of some fellow who runs his
server on your ISP. All I need to do is to find out what port my
rival's server is listening to (e.g. using C<netstat(8)>) and
configure my own server to listen on the same port. Although I am
unable to bind to this port, imagine what will happen when you reboot
your system and my startup script happens to be run before my rival's
one! I get the port first, now all requests will be redirected to my
server. I'll leave to your imagination what nasty things might happen
then.
Of course the ugly things will quickly be revealed, but not before the
damage has been done.
Luckily there are special tools that can ensure that users that aren't
authorized to bind to certain ports (above 1024) won't be able to do
so. One such a tool is called C<cbs> and its documentation can be
found at I<http://www.epita.fr/~flav/cbs/doc/html>.
=back
Basically you can preassign each user a port, without them having to
worry about finding a free one, as well as enforce C<MaxClients> and
similar values by implementing the following scenario:
For each user have two configuration files, the main file,
I<httpd.conf> (non-writable by user) and the user's file,
I<username.httpd.conf> where they can specify their own configuration
parameters and override the ones defined in I<httpd.conf>. Here is
what the main configuration file looks like:
httpd.conf
----------
# Global/default settings, the user may override some of these
...
...
# Included so that user can set his own configuration
Include username.httpd.conf
# User-specific settings which will override any potentially
# dangerous configuration directives in username.httpd.conf
...
...
username.httpd.conf
-------------------
# Settings that your user would like to add/override,
# like <Location> and PerlModule directives, etc.
Apache reads the global/default settings first. Then it reads the
I<Include>'d I<username.httpd.conf> file with whatever settings the
user has chosen, and finally it reads the user-specific settings that
we don't want the user to override, such as the port number. Even if
the user changes the port number in his I<username.httpd.conf> file,
Apache reads our settings last, so they take precedence. Note that
you can use L<Perl sections|guide::config/Apache_Configuration_in_Perl> to
make the configuration much easier.
=item 3
A much better, but costly solution is I<co-location>. Let the user
hook his (or your) stand-alone machine into your network, and forget
about this user. Of course either the user or you will have to
undertake all the system administration chores and it will cost your
client more money.
Who are the people who seek mod_perl support? They are people who run
serious projects/businesses. Money is not usually an obstacle. They
can afford a stand alone box, thus achieving their goal of autonomy
whilst keeping their ISP happy.
=back
=head2 Virtual Servers Technologies
As we have just seen one of the obstacles of using mod_perl in ISP
environments, is the problem of isolating customers using the same
machine from each other. A number of virtual servers (don't confuse
with virtual hosts) technologies (both commercial and Open Source)
exist today. Here are some of them:
=over
=item * The User-mode Linux Kernel
http://user-mode-linux.sourceforge.net/
User-Mode Linux is a safe, secure way of running Linux versions and
Linux processes. Run buggy software, experiment with new Linux kernels
or distributions, and poke around in the internals of Linux, all
without risking your main Linux setup.
User-Mode Linux gives you a virtual machine that may have more
hardware and software virtual resources than your actual, physical
computer. Disk storage for the virtual machine is entirely contained
inside a single file on your physical machine. You can assign your
virtual machine only the hardware access you want it to have. With
properly limited access, nothing you do on the virtual machine can
change or damage your real computer, or its software.
So if you want to completely protect one user from another and
yourself from your users this might be yet another alternative to the
solutions suggested at the beginning of this chapter.
=item * VMWare Technology
Allows running a few instances of the same or different OSs on the
same machine. This technology comes in two flavors:
Open source: http://www.plex86.org/
Commercial: http://www.vmware.com/
So you may want to run a separate OS for each of your clients
=item * freeVSD Technology
freeVSD (http://www.freevsd.org), an open source project sponsored by
Idaya Ltd. The software enables ISPs to securely partition their
physical servers into many I<virtual servers>, each capable of running
popular hosting applications such as Apache, Sendmail and MySQL.
=item * S/390 IBM server
Quoting from: http://www.s390.ibm.com/linux/vif/
"The S/390 Virtual Image Facility enables you to run tens to hundreds
of Linux server images on a single S/390 server. It is ideally suited
for those who want to move Linux and/or UNIX workloads deployed on
multiple servers onto a single S/390 server, while maintaining the
same number of distinct server images. This provides centralized
management and operation of the multiple image environment, reducing
complexity, easing administration and lowering costs."
In two words, this a great solution to huge ISPs, as it allows you to
run hundreds of mod_perl servers while having only one box to
maintain. The drawback is the price :)
Check out this scalable mailing list thread for more details from
those who know:
http://archive.develooper.com/scalable@arctic.org/msg00235.html
=back
=head1 Virtual Hosts in the guide
If you are about to use I<Virtual Hosts> you might want to read these
sections:
L<Apache Configuration in Perl|guide::config/Apache_Configuration_in_Perl>
L<Easing the Chores of Configuring Virtual Hosts with
mod_macro|guide::config/Configuring_Apache___mod_perl_with_mod_macro>
L<Is There a Way to Provide a Different startup.pl File for Each
Individual Virtual Host|guide::config/Is_There_a_Way_to_Provide_a_Different_startup_pl_File_for_Each_Individual_Virtual_Host>
L<Is There a Way to Modify @INC on a Per-Virtual-Host or Per-Location
Basis.|guide::config/Is_There_a_Way_to_Modify__INC_on_a_Per_Virtual_Host_or_Per_Location_Basis_>
L<A Script From One Virtual Host Calls a Script with the Same Path
From the Other Virtual Host|guide::config/A_Script_From_One_Virtual_Host_Calls_a_Script_with_the_Same_Path_From_the_Other_Virtual_Host>
=head1 Maintainers
Maintainer is the person(s) you should contact with updates,
corrections and patches.
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=back
=head1 Authors
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=back
Only the major authors are listed above. For contributors see the
Changes file.
=cut
1.1 modperl-docs/src/docs/general/perl_myth/perl_myth.pod
Index: perl_myth.pod
===================================================================
=head1 NAME
Popular Perl Complaints and Myths
=head1 Description
This document tries to explain the myths about Perl and overturn the
FUD certain bodies try to spread.
=head1 Abbreviations
=over 4
=item *
B<M> = Misconception or Myth
=item *
B<R> = Response
=back
=head2 Interpreted vs. Compiled
=over 4
=item M:
Each dynamic perl page hit needs to load the Perl interpreter and
compile the script, then run it each time a dynamic web page is hit.
This dramatically decreases performance as well as makes Perl an
unscalable model since so much overhead is required to search each
page.
=item R:
This myth was true years ago before the advent of mod_perl. mod_perl
loads the interpreter once into memory and never needs to load it
again. Each perl program is only compiled once. The compiled version
is then kept into memory and used each time the program is run. In
this way there is no extra overhead when hitting a mod_perl page.
=back
=head3 Interpreted vs. Compiled (More Gory Details)
=over 4
=item R:
Compiled code always has the potential to be faster than interpreted
code. Ultimately, all interpreted code needs to eventually be converted
to native instructions at some point, and this is invariably has to be
done by a compiled application.
That said, an interpreted language CAN be faster than a comprable
native application in certain situations, given certain, common
programming practices. For example, the allocation and de-allocation
of memory can be a relatively expensive process in a tightly scoped
compiled language, wheras interpreted languages typically use garbage
collectors which don't need to do expensive deallocation in a tight
loop, instead waiting until additional memory is absolutely necessary,
or for a less computationally intensive period. Of course, using a
garbage collector in C would eliminate this edge in this situation,
but where using garbage collectors in C is uncommon, Perl and most
other interpreted languages have built-in garbage collectors.
It is also important to point out that few people use the full
potential of their modern CPU with a single application. Modern CPUs
are not only more than fast enough to run interpreted code, many
processors include instruction sets designed to increase the
performance of interpreted code.
=back
=head2 Perl is overly memory intensive making it unscalable
=over 4
=item M:
Each child process needs the Perl interpreter and all code in memory.
Even with mod_perl httpd processes tend to be overly large, slowing
performance, and requiring much more hardware.
=item R:
In mod_perl the interpreter is loaded into the parent process and
shared between the children. Also, when scripts are loaded into the
parent and the parent forks a child httpd process, that child shares
those scripts with the parent. So while the child may take 6MB of
memory, 5MB of that might be shared meaning it only really uses 1MB
per child. Even 5 MB of memory per child is not uncommon for most web
applications on other languages.
Also, most modern operating systems support the concept of shared
libraries. Perl can be compiled as a shared library, enabling the bulk
of the perl interpreter to be shared between processes. Some
executable formats on some platforms (I believe ELF is one such
format) are able to share entire executable TEXT segments between
unrelated processes.
=back
=head3 More Tuning Advice:
=over 4
=item *
L<Stas Bekman's Performance Guide|guide::performance>
=back
=head2 Not enough support, or tools to develop with Perl. (Myth)
=over 4
=item R:
Of all web applications and languages, Perl arguable has the most
support and tools. B<CPAN> is a central repository of Perl modules
which are freely downloadable and usually well supported. There are
literally thousands of modules which make building web apps in Perl
much easier. There are also countless mailing lists of extremely
responsive Perl experts who usually respond to questions within an
hour. There are also a number of Perl development environments to
make building Perl Web applications easier. Just to name a few, there
is C<Apache::ASP>, C<Mason>, C<embPerl>, C<ePerl>, etc...
=back
=head2 If Perl scales so well, how come no large sites use it? (myth)
=over 4
=item R:
Actually, many large sites DO use Perl for the bulk of their web
applications. Here are some, just as an example: B<e-Toys>,
B<CitySearch>, B<Internet Movie Database>( http://imdb.com ), B<Value
Click> ( http://valueclick.com ), B<Paramount Digital Entertainment>,
B<CMP> ( http://cmpnet.com ), B<HotBot Mail>/B<HotBot Homepages>, and
B<DejaNews> to name a few. Even B<Microsoft> has taken interest in
Perl via http://www.activestate.com/.
=back
=head2 Perl even with mod_perl, is always slower then C.
=over 4
=item R:
The Perl engine is written in C. There is no point arguing that Perl
is faster than C because anything written in Perl could obviously be
re-written in C. The same holds true for arguing that C is faster than
assembly.
There are two issues to consider here. First of all, many times a web
application written in Perl B<CAN be faster> than C thanks to the low
level optimizations in the Perl compiler. In other words, its easier
to write poorly written C then well written Perl. Secondly its
important to weigh all factors when choosing a language to build a web
application in. Time to market is often one of the highest priorities
in creating a web application. Development in Perl can often be twice
as fast as in C. This is mostly due to the differences in the
language themselves as well as the wealth of free examples and modules
which speed development significantly. Perl's speedy development time
can be a huge competitive advantage.
=back
=head2 Java does away with the need for Perl.
=over 4
=item M:
Perl had its place in the past, but now there's Java and Java will
kill Perl.
=item R:
Java and Perl are actually more complimentary languages then
competitive. Its widely accepted that server side Java solutions such
as C<JServ>, C<JSP> and C<JRUN>, are far slower then mod_perl
solutions (see next myth). Even so, Java is often used as the front
end for server side Perl applications. Unlike Perl, with Java you can
create advanced client side applications. Combined with the strength
of server side Perl these client side Java applications can be made
very powerful.
=back
=head2 Perl can't create advanced client side applications
=over 4
=item R:
True. There are some client side Perl solutions like PerlScript in
MSIE 5.0, but all client side Perl requires the user to have the Perl
interpreter on their local machine. Most users do not have a Perl
interpreter on their local machine. Most Perl programmers who need to
create an advanced client side application use Java as their client
side programming language and Perl as the server side solution.
=back
=head2 ASP makes Perl obsolete as a web programming language.
=over 4
=item M:
With Perl you have to write individual programs for each set of pages.
With ASP you can write simple code directly within HTML pages. ASP is
the Perl killer.
=item R:
There are many solutions which allow you to embed Perl in web pages
just like ASP. In fact, you can actually use Perl IN ASP pages with
PerlScript. Other solutions include: C<Mason>, C<Apache::ASP>,
C<ePerl>, C<embPerl> and C<XPP>. Also, Microsoft and ActiveState have
worked very hard to make Perl run equally well on NT as Unix. You can
even create COM modules in Perl that can be used from within ASP
pages. Some other advantages Perl has over ASP: mod_perl is usually
much faster then ASP, Perl has much more example code and full
programs which are freely downloadable, and Perl is cross platform,
able to run on Solaris, Linux, SCO, Digital Unix, Unix V, AIX, OS2,
VMS MacOS, Win95-98 and NT to name a few.
Also, Benchmarks show that embedded Perl solutions outperform ASP/VB
on IIS by several orders of magnitude. Perl is a much easier language
for some to learn, especially those with a background in C or C++.
=back
=head1 Credits
Thanks to the mod_perl list for all of the good information and
criticism. I'd especially like to thank,
=over 4
=item *
Stas Bekman E<lt>stas@stason.orgE<gt>
=item *
Thornton Prime E<lt>thornton@cnation.comE<gt>
=item *
Chip Turner E<lt>chip@ns.zfx.comE<gt>
=item *
Clinton E<lt>clint@drtech.co.ukE<gt>
=item *
Joshua Chamas E<lt>joshua@chamas.comE<gt>
=item *
John Edstrom E<lt>edstrom@Poopsie.hmsc.orst.eduE<gt>
=item *
Rasmus Lerdorf E<lt>rasmus@lerdorf.on.caE<gt>
=item *
Nedim Cholich E<lt>nedim@comstar.netE<gt>
=item *
Mike Perry E<lt> http://www.icorp.net/icorp/feedback.htm E<gt>
=item *
Finally, I'd like to thank Robert Santos E<lt>robert@cnation.comE<gt>,
CyberNation's lead Business Development guy for inspiring this
document.
=back
=head1 Maintainers
Maintainer is the person(s) you should contact with updates,
corrections and patches.
=over
=item *
Contact the L<mod_perl docs list|maillist::docs-dev>
=back
=head1 Authors
=over
=item *
Adam Pisoni E<lt>adam@cnation.comE<gt>
=back
Only the major authors are listed above. For contributors see the
Changes file.
=cut
1.1 modperl-docs/src/docs/general/perl_reference/perl_reference.pod
Index: perl_reference.pod
===================================================================
=head1 NAME
Perl Reference
=head1 Description
This document was born because some users are reluctant to learn Perl,
prior to jumping into mod_perl. I will try to cover some of the most
frequent pure Perl questions being asked at the list.
Before you decide to skip this chapter make sure you know all the
information provided here. The rest of the Guide assumes that you
have read this chapter and understood it.
=head1 perldoc's Rarely Known But Very Useful Options
First of all, I want to stress that you cannot become a Perl hacker
without knowing how to read Perl documentation and search through it.
Books are good, but an easily accessible and searchable Perl reference
at your fingertips is a great time saver. It always has the up-to-date
information for the version of perl you're using.
Of course you can use online Perl documentation at the Web. The two
major sites are http://www.perldoc.com and
http://theoryx5.uwinnipeg.ca/CPAN/perl/.
The C<perldoc> utility provides you with access to the documentation
installed on your system. To find out what Perl manpages are
available execute:
% perldoc perl
To find what functions perl has, execute:
% perldoc perlfunc
To learn the syntax and to find examples of a specific function, you
would execute (e.g. for C<open()>):
% perldoc -f open
Note: In perl5.005_03 and earlier, there is a bug in this and the C<-q>
options of C<perldoc>. It won't call C<pod2man>, but will display the
section in POD format instead. Despite this bug it's still readable
and very useful.
The Perl FAQ (I<perlfaq> manpage) is in several sections. To search
through the sections for C<open> you would execute:
% perldoc -q open
This will show you all the matching Question and Answer sections,
still in POD format.
To read the I<perldoc> manpage you would execute:
% perldoc perldoc
=head1 Tracing Warnings Reports
Sometimes it's very hard to understand what a warning is complaining
about. You see the source code, but you cannot understand why some
specific snippet produces that warning. The mystery often results
from the fact that the code can be called from different places if
it's located inside a subroutine.
Here is an example:
warnings.pl
-----------
#!/usr/bin/perl -w
use strict;
correct();
incorrect();
sub correct{
print_value("Perl");
}
sub incorrect{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
In the code above, print_value() prints the passed value. Subroutine
correct() passes the value to print, but in subroutine incorrect() we
forgot to pass it. When we run the script:
% ./warnings.pl
we get the warning:
Use of uninitialized value at ./warnings.pl line 16.
Perl complains about an undefined variable C<$var> at the line that
attempts to print its value:
print "My value is $var\n";
But how do we know why it is undefined? The reason here obviously is
that the calling function didn't pass the argument. But how do we know
who was the caller? In our example there are two possible callers, in
the general case there can be many of them, perhaps located in other
files.
We can use the caller() function, which tells who has called us, but
even that might not be enough: it's possible to have a longer sequence
of called subroutines, and not just two. For example, here it is sub
third() which is at fault, and putting sub caller() in sub second()
would not help us very much:
sub third{
second();
}
sub second{
my $var = shift;
first($var);
}
sub first{
my $var = shift;
print "Var = $var\n"
}
The solution is quite simple. What we need is a full calls stack trace
to the call that triggered the warning.
The C<Carp> module comes to our aid with its cluck() function. Let's
modify the script by adding a couple of lines. The rest of the script
is unchanged.
warnings2.pl
-----------
#!/usr/bin/perl -w
use strict;
use Carp ();
local $SIG{__WARN__} = \&Carp::cluck;
correct();
incorrect();
sub correct{
print_value("Perl");
}
sub incorrect{
print_value();
}
sub print_value{
my $var = shift;
print "My value is $var\n";
}
Now when we execute it, we see:
Use of uninitialized value at ./warnings2.pl line 19.
main::print_value() called at ./warnings2.pl line 14
main::incorrect() called at ./warnings2.pl line 7
Take a moment to understand the calls stack trace. The deepest calls
are printed first. So the second line tells us that the warning was
triggered in print_value(); the third, that print_value() was
called by subroutine, incorrect().
script => incorrect() => print_value()
We go into C<incorrect()> and indeed see that we forgot to pass the
variable. Of course when you write a subroutine like C<print_value> it
would be a good idea to check the passed arguments before starting
execution. We omitted that step to contrive an easily debugged example.
Sure, you say, I could find that problem by simple inspection of the
code!
Well, you're right. But I promise you that your task would be quite
complicated and time consuming if your code has some thousands of
lines. In addition, under mod_perl, certain uses of the C<eval>
operator and "here documents" are known to throw off Perl's line
numbering, so the messages reporting warnings and errors can have
incorrect line numbers. (See L<Finding the Line Which Triggered the
Error or Warning|guide::debug/Finding_the_Line_Which_Triggered> for more
information).
Getting the trace helps a lot.
=head1 Variables Globally, Lexically Scoped And Fully Qualified
META: this material is new and requires polishing so read with care.
You will hear a lot about namespaces, symbol tables and lexical
scoping in Perl discussions, but little of it will make any sense
without a few key facts:
=head2 Symbols, Symbol Tables and Packages; Typeglobs
There are two important types of symbol: package global and lexical.
We will talk about lexical symbols later, for now we will talk only
about package global symbols, which we will refer to simply as
I<global symbols>.
The names of pieces of your code (subroutine names) and the names of
your global variables are symbols. Global symbols reside in one
symbol table or another. The code itself and the data do not; the
symbols are the names of pointers which point (indirectly) to the
memory areas which contain the code and data. (Note for C/C++
programmers: we use the term `pointer' in a general sense of one piece
of data referring to another piece of data not in a specific sense as
used in C or C++.)
There is one symbol table for each package, (which is why I<global
symbols> are really I<package global symbols>).
You are always working in one package or another.
Like in C, where the first function you write must be called main(),
the first statement of your first Perl script is in package C<main::>
which is the default package. Unless you say otherwise by using the
C<package> statement, your symbols are all in package C<main::>. You
should be aware straight away that files and packages are I<not
related>. You can have any number of packages in a single file; and a
single package can be in one file or spread over many files. However
it is very common to have a single package in a single file. To
declare a package you write:
package mypackagename;
From the following line you are in package C<mypackagename> and any
symbols you declare reside in that package. When you create a symbol
(variable, subroutine etc.) Perl uses the name of the package in which
you are currently working as a prefix to create the fully qualified
name of the symbol.
When you create a symbol, Perl creates a symbol table entry for that
symbol in the current package's symbol table (by default
C<main::>). Each symbol table entry is called a I<typeglob>. Each
typeglob can hold information on a scalar, an array, a hash, a
subroutine (code), a filehandle, a directory handle and a format, each
of which all have the same name. So you see now that there are two
indirections for a global variable: the symbol, (the thing's name),
points to its typeglob and the typeglob for the thing's type (scalar,
array, etc.) points to the data. If we had a scalar and an array with
the same name their name would point to the same typeglob, but for
each type of data the typeglob points to somewhere different and so
the scalar's data and the array's data are completely separate and
independent, they just happen to have the same name.
Most of the time, only one part of a typeglob is used (yes, it's a bit
wasteful). You will by now know that you distinguish between them by
using what the authors of the Camel book call a I<funny character>. So
if we have a scalar called `C<line>' we would refer to it in code as
C<$line>, and if we had an array of the same name, that would be
written, C<@line>. Both would point to the same typeglob (which would
be called C<*line>), but because of the I<funny character> (also known
as I<decoration>) perl won't confuse the two. Of course we might
confuse ourselves, so some programmers don't ever use the same name
for more than one type of variable.
Every global symbol is in some package's symbol table. To refer to a
global symbol we could write the I<fully qualified> name,
e.g. C<$main::line>. If we are in the same package as the symbol we
can omit the package name, e.g. C<$line> (unless you use the C<strict>
pragma and then you will have to predeclare the variable using the
C<vars> pragma). We can also omit the package name if we have imported
the symbol into our current package's namespace. If we want to refer
to a symbol that is in another package and which we haven't imported
we must use the fully qualified name, e.g. C<$otherpkg::box>.
Most of the time you do not need to use the fully qualified symbol
name because most of the time you will refer to package variables from
within the package. This is very like C++ class variables. You can
work entirely within package C<main::> and never even know you are
using a package, nor that the symbols have package names. In a way,
this is a pity because you may fail to learn about packages and they
are extremely useful.
The exception is when you I<import> the variable from another package.
This creates an alias for the variable in the I<current> package, so
that you can access it without using the fully qualified name.
Whilst global variables are useful for sharing data and are necessary in some
contexts it is usually wisest to minimize their use and use I<lexical
variables>, discussed next, instead.
Note that when you create a variable, the low-level business of
allocating memory to store the information is handled automatically by
Perl. The intepreter keeps track of the chunks of memory to which the
pointers are pointing and takes care of undefining variables. When all
references to a variable have ceased to exist then the perl garbage
collector is free to take back the memory used ready for
recycling. However perl almost never returns back memory it has
already used to the operating system during the lifetime of the
process.
=head3 Lexical Variables and Symbols
The symbols for lexical variables (i.e. those declared using the
keyword C<my>) are the only symbols which do I<not> live in a symbol
table. Because of this, they are not available from outside the block
in which they are declared. There is no typeglob associated with a
lexical variable and a lexical variable can refer only to a scalar, an
array, a hash or a code reference. (Since perl-5.6 it can also refer
to a file glob).
If you need access to the data from outside the package then you can
return it from a subroutine, or you can create a global variable
(i.e. one which has a package prefix) which points or refers to it and
return that. The pointer or reference must be global so that you can
refer to it by a fully qualified name. But just like in C try to avoid
having global variables. Using OO methods generally solves this
problem, by providing methods to get and set the desired value within
the object that can be lexically scoped inside the package and passed
by reference.
The phrase "lexical variable" is a bit of a misnomer, we are really
talking about "lexical symbols". The data can be referenced by a
global symbol too, and in such cases when the lexical symbol goes out
of scope the data will still be accessible through the global symbol.
This is perfectly legitimate and cannot be compared to the terrible
mistake of taking a pointer to an automatic C variable and returning
it from a function--when the pointer is dereferenced there will be a
segmentation fault. (Note for C/C++ programmers: having a function
return a pointer to an auto variable is a disaster in C or C++; the
perl equivalent, returning a reference to a lexical variable created
in a function is normal and useful.)
=over
=item *
C<my()> vs. C<use vars>:
With use vars(), you are making an entry in the symbol table, and you
are telling the compiler that you are going to be referencing that
entry without an explicit package name.
With my(), NO ENTRY IS PUT IN THE SYMBOL TABLE. The compiler figures
out C<at compile time> which my() variables (i.e. lexical variables)
are the same as each other, and once you hit execute time you cannot
go looking those variables up in the symbol table.
=item *
C<my()> vs. C<local()>:
local() creates a temporal-limited package-based scalar, array, hash,
or glob -- when the scope of definition is exited at runtime, the
previous value (if any) is restored. References to such a variable
are *also* global... only the value changes. (Aside: that is what
causes variable suicide. :)
my() creates a lexically-limited non-package-based scalar, array, or
hash -- when the scope of definition is exited at compile-time, the
variable ceases to be accessible. Any references to such a variable
at runtime turn into unique anonymous variables on each scope exit.
=back
=head2 Additional reading references
For more information see: L<Using global variables and sharing them
between
modules/packages|general::perl_reference::perl_reference/Using_Global_Variables_and_Shari>
and an article by Mark-Jason Dominus about how Perl handles variables
and namespaces, and the difference between C<use vars()> and C<my()> -
http://www.plover.com/~mjd/perl/FAQs/Namespaces.html .
=head1 my() Scoped Variable in Nested Subroutines
Before we proceed let's make the assumption that we want to develop
the code under the C<strict> pragma. We will use lexically scoped
variables (with help of the my() operator) whenever it's possible.
=head2 The Poison
Let's look at this code:
nested.pl
-----------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
sub power_of_2 {
return $x ** 2;
}
my $result = power_of_2();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Don't let the weird subroutine names fool you, the print_power_of_2()
subroutine should print the square of the number passed to it. Let's
run the code and see whether it works:
% ./nested.pl
5^2 = 25
6^2 = 25
Ouch, something is wrong. May be there is a bug in Perl and it doesn't
work correctly with the number 6? Let's try again using 5 and 7:
print_power_of_2(5);
print_power_of_2(7);
And run it:
% ./nested.pl
5^2 = 25
7^2 = 25
Wow, does it works only for 5? How about using 3 and 5:
print_power_of_2(3);
print_power_of_2(5);
and the result is:
% ./nested.pl
3^2 = 9
5^2 = 9
Now we start to understand--only the first call to the
print_power_of_2() function works correctly. Which makes us think that
our code has some kind of memory for the results of the first
execution, or it ignores the arguments in subsequent executions.
=head2 The Diagnosis
Let's follow the guidelines and use the C<-w> flag. Now execute the
code:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 9.
5^2 = 25
6^2 = 25
We have never seen such a warning message before and we don't quite
understand what it means. The C<diagnostics> pragma will certainly
help us. Let's prepend this pragma before the C<strict> pragma in our
code:
#!/usr/bin/perl -w
use diagnostics;
use strict;
And execute it:
% ./nested.pl
Variable "$x" will not stay shared at ./nested.pl line 10 (#1)
(W) An inner (nested) named subroutine is referencing a lexical
variable defined in an outer subroutine.
When the inner subroutine is called, it will probably see the value of
the outer subroutine's variable as it was before and during the
*first* call to the outer subroutine; in this case, after the first
call to the outer subroutine is complete, the inner and outer
subroutines will no longer share a common value for the variable. In
other words, the variable will no longer be shared.
Furthermore, if the outer subroutine is anonymous and references a
lexical variable outside itself, then the outer and inner subroutines
will never share the given variable.
This problem can usually be solved by making the inner subroutine
anonymous, using the sub {} syntax. When inner anonymous subs that
reference variables in outer subroutines are called or referenced,
they are automatically rebound to the current values of such
variables.
5^2 = 25
6^2 = 25
Well, now everything is clear. We have the B<inner> subroutine
power_of_2() and the B<outer> subroutine print_power_of_2() in our
code.
When the inner power_of_2() subroutine is called for the first time,
it sees the value of the outer print_power_of_2() subroutine's C<$x>
variable. On subsequent calls the inner subroutine's C<$x> variable
won't be updated, no matter what new values are given to C<$x> in the
outer subroutine. There are two copies of the C<$x> variable, no
longer a single one shared by the two routines.
=head2 The Remedy
The C<diagnostics> pragma suggests that the problem can be solved by
making the inner subroutine anonymous.
An anonymous subroutine can act as a I<closure> with respect to
lexically scoped variables. Basically this means that if you define a
subroutine in a particular B<lexical> context at a particular moment,
then it will run in that same context later, even if called from
outside that context. The upshot of this is that when the subroutine
B<runs>, you get the same copies of the lexically scoped variables
which were visible when the subroutine was B<defined>. So you can
pass arguments to a function when you define it, as well as when you
invoke it.
Let's rewrite the code to use this technique:
anonymous.pl
--------------
#!/usr/bin/perl
use strict;
sub print_power_of_2 {
my $x = shift;
my $func_ref = sub {
return $x ** 2;
};
my $result = &$func_ref();
print "$x^2 = $result\n";
}
print_power_of_2(5);
print_power_of_2(6);
Now C<$func_ref> contains a reference to an anonymous subroutine,
which we later use when we need to get the power of two. Since it is
anonymous, the subroutine will automatically be rebound to the new
value of the outer scoped variable C<$x>, and the results will now be
as expected.
Let's verify:
% ./anonymous.pl
5^2 = 25
6^2 = 36
So we can see that the problem is solved.
=head1 Understanding Closures -- the Easy Way
In Perl, a closure is just a subroutine that refers to one or more
lexical variables declared outside the subroutine itself and must
therefore create a distinct clone of the environment on the way out.
And both named subroutines and anonymous subroutines can be closures.
Here's how to tell if a subroutine is a closure or not:
for (1..5) {
push @a, sub { "hi there" };
}
for (1..5) {
{
my $b;
push @b, sub { $b."hi there" };
}
}
print "anon normal:\n", join "\t\n",@a,"\n";
print "anon closure:\n",join "\t\n",@b,"\n";
which generates:
anon normal:
CODE(0x80568e4)
CODE(0x80568e4)
CODE(0x80568e4)
CODE(0x80568e4)
CODE(0x80568e4)
anon closure:
CODE(0x804b4c0)
CODE(0x8056b54)
CODE(0x8056bb4)
CODE(0x80594d8)
CODE(0x8059538)
Note how each code reference from the non-closure is identical, but
the closure form must generate distinct coderefs to point at the
distinct instances of the closure.
And now the same with named subroutines:
for (1..5) {
sub a { "hi there" };
push @a, \&a;
}
for (1..5) {
{
my $b;
sub b { $b."hi there" };
push @b, \&b;
}
}
print "normal:\n", join "\t\n",@a,"\n";
print "closure:\n",join "\t\n",@b,"\n";
which generates:
anon normal:
CODE(0x80568c0)
CODE(0x80568c0)
CODE(0x80568c0)
CODE(0x80568c0)
CODE(0x80568c0)
anon closure:
CODE(0x8056998)
CODE(0x8056998)
CODE(0x8056998)
CODE(0x8056998)
CODE(0x8056998)
We can see that both versions has generated the same code
reference. For the subroutine I<a> it's easy, since it doesn't include
any lexical variables defined outside it in the same lexical scope.
As for the subroutine I<b>, it's indeed a closure, but Perl won't
recompile it since it's a named subroutine (see the I<perlsub>
manpage). It's something that we don't want to happen in our code
unless we want it for this special effect, similar to I<static>
variables in C.
This is the underpinnings of that famous I<"won't stay shared">
message. A I<my> variable in a named subroutine context is generating
identical code references and therefore it ignores any future changes
to the lexical variables outside of it.
=head2 Mike Guy's Explanation of the Inner Subroutine Behavior
From: mjtg@cus.cam.ac.uk (M.J.T. Guy)
Newsgroups: comp.lang.perl.misc
Subject: Re: Lexical scope and embedded subroutines.
Date: 6 Jan 1998 18:22:39 GMT
Message-ID: <68...@lyra.csx.cam.ac.uk>
In article <68...@brokaw.wa.com>, Aaron Harsh <aj...@rtk.com>
wrote:
> Before I read this thread (and perlsub to get the details) I would
> have assumed the original code was fine.
>
> This behavior brings up the following questions:
> o Is Perl's behavior some sort of speed optimization?
No, but see below.
> o Did the Perl gods just decide that scheme-like behavior was less
> important than the pseduo-static variables described in perlsub?
This subject has been kicked about at some length on perl5-porters.
The current behaviour was chosen as the best of a bad job. In the
context of Perl, it's not obvious what "scheme-like behavior" means.
So it isn't an option. See below for details.
> o Does anyone else find Perl's behavior counter-intuitive?
*Everyone* finds it counterintuitive. The fact that it only generates
a warning rather than a hard error is part of the Perl Gods policy of
hurling thunderbolts at those so irreverent as not to use -w.
> o Did programming in scheme destroy my ability to judge a decent
> language
> feature?
You're still interested in Perl, so it can't have rotted your brain
completely.
> o Have I misremembered how scheme handles these situations?
Probably not.
> o Do Perl programmers really care how much Perl acts like scheme?
Some do.
> o Should I have stopped this message two or three questions ago?
Yes.
The problem to be solved can be stated as
"When a subroutine refers to a variable which is instantiated more
than once (i.e. the variable is declared in a for loop, or in a
subroutine), which instance of that variable should be used?"
The basic problem is that Perl isn't Scheme (or Pascal or any of the
other comparators that have been used).
In almost all lexically scoped languages (i.e. those in the Algol60
tradition), named subroutines are also lexically scoped. So the scope
of the subroutine is necessarily contained in the scope of any
external variable referred to inside the subroutine. So there's an
obvious answer to the "which instance?" problem.
But in Perl, named subroutines are globally scoped. (But in some
future Perl, you'll be able to write
my sub lex { ... }
to get lexical scoping.) So the solution adopted by other languages
can't be used.
The next suggestion most people come up with is "Why not use the most
recently instantiated variable?". This Does The Right Thing in many
cases, but fails when recursion or other complications are involved.
Consider:
sub outer {
inner();
outer();
my $trouble;
inner();
sub inner { $trouble };
outer();
inner();
}
Which instance of $trouble is to be used for each call of inner()?
And why?
The consensus was that an incomplete solution was unacceptable, so the
simple rule "Use the first instance" was adopted instead.
And it is more efficient than possible alternative rules. But that's
not why it was done.
Mike Guy
=head1 When You Cannot Get Rid of The Inner Subroutine
First you might wonder, why in the world will someone need to define
an inner subroutine? Well, for example to reduce some of Perl's script
startup overhead you might decide to write a daemon that will compile
the scripts and modules only once, and cache the pre-compiled code in
memory. When some script is to be executed, you just tell the daemon
the name of the script to run and it will do the rest and do it much
faster since compilation has already taken place.
Seems like an easy task, and it is. The only problem is once the
script is compiled, how do you execute it? Or let's put it the other
way: after it was executed for the first time and it stays compiled in
the daemon's memory, how do you call it again? If you could get all
developers to code their scripts so each has a subroutine called run()
that will actually execute the code in the script then we've solved
half the problem.
But how does the daemon know to refer to some specific script if they
all run in the C<main::> name space? One solution might be to ask the
developers to declare a package in each and every script, and for the
package name to be derived from the script name. However, since there
is a chance that there will be more than one script with the same name
but residing in different directories, then in order to prevent
namespace collisions the directory has to be a part of the package
name too. And don't forget that the script may be moved from one
directory to another, so you will have to make sure that the package
name is corrected every time the script gets moved.
But why enforce these strange rules on developers, when we can arrange
for our daemon to do this work? For every script that the daemon is
about to execute for the first time, the script should be wrapped
inside the package whose name is constructed from the mangled path to
the script and a subroutine called run(). For example if the daemon is
about to execute the script I</tmp/hello.pl>:
hello.pl
--------
#!/usr/bin/perl
print "Hello\n";
Prior to running it, the daemon will change the code to be:
wrapped_hello.pl
----------------
package cache::tmp::hello_2epl;
sub run{
#!/usr/bin/perl
print "Hello\n";
}
The package name is constructed from the prefix C<cache::>, each
directory separation slash is replaced with C<::>, and non
alphanumeric characters are encoded so that for example C<.> (a dot)
becomes C<_2e> (an underscore followed by the ASCII code for a dot in
hex representation).
% perl -e 'printf "%x",ord(".")'
prints: C<2e>. The underscore is the same you see in URL encoding
except the C<%> character is used instead (C<%2E>), but since C<%> has
a special meaning in Perl (prefix of hash variable) it couldn't be
used.
Now when the daemon is requested to execute the script
I</tmp/hello.pl>, all it has to do is to build the package name as
before based on the location of the script and call its run()
subroutine:
use cache::tmp::hello_2epl;
cache::tmp::hello_2epl::run();
We have just written a partial prototype of the daemon we wanted. The
only outstanding problem is how to pass the path to the script to the
daemon. This detail is left as an exercise for the reader.
If you are familiar with the C<Apache::Registry> module, you know that
it works in almost the same way. It uses a different package prefix
and the generic function is called handler() and not run(). The
scripts to run are passed through the HTTP protocol's headers.
Now you understand that there are cases where your normal subroutines
can become inner, since if your script was a simple:
simple.pl
---------
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
Wrapped into a run() subroutine it becomes:
simple.pl
---------
package cache::simple_2epl;
sub run{
#!/usr/bin/perl
sub hello { print "Hello" }
hello();
}
Therefore, hello() is an inner subroutine and if you have used my()
scoped variables defined and altered outside and used inside hello(),
it won't work as you expect starting from the second call, as was
explained in the previous section.
=head2 Remedies for Inner Subroutines
First of all there is nothing to worry about, as long as you don't
forget to turn the warnings On. If you do happen to have the
"L<my() Scoped Variable in Nested
Subroutines|general::perl_reference::perl_reference/my_Scoped_Variable_in_Nested_S>"
problem, Perl will always alert you.
Given that you have a script that has this problem, what are the ways
to solve it? There are many of them and we will discuss some of them
here.
We will use the following code to show the different solutions.
multirun.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run{
my $counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
This code executes the run() subroutine three times, which in turn
initializes the C<$counter> variable to 0, every time it is executed
and then calls the inner subroutine increment_counter() twice. Sub
increment_counter() prints C<$counter>'s value after incrementing
it. One might expect to see the following output:
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
But as we have already learned from the previous sections, this is not
what we are going to see. Indeed, when we run the script we see:
% ./multirun.pl
Variable "$counter" will not stay shared at ./nested.pl line 18.
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 3 !
Counter is equal to 4 !
run: [time 3]
Counter is equal to 5 !
Counter is equal to 6 !
Obviously, the C<$counter> variable is not reinitialized on each
execution of run(). It retains its value from the previous execution,
and sub increment_counter() increments that.
One of the workarounds is to use globally declared variables, with the
C<vars> pragma.
multirun1.pl
-----------
#!/usr/bin/perl -w
use strict;
use vars qw($counter);
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
} # end of sub run
If you run this and the other solutions offered below, the expected
output will be generated:
% ./multirun1.pl
run: [time 1]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 2]
Counter is equal to 1 !
Counter is equal to 2 !
run: [time 3]
Counter is equal to 1 !
Counter is equal to 2 !
By the way, the warning we saw before has gone, and so has the
problem, since there is no C<my()> (lexically defined) variable used
in the nested subroutine.
Another approach is to use fully qualified variables. This is better,
since less memory will be used, but it adds a typing overhead:
multirun2.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
$main::counter = 0;
increment_counter();
increment_counter();
sub increment_counter{
$main::counter++;
print "Counter is equal to $main::counter !\n";
}
} # end of sub run
You can also pass the variable to the subroutine by value and make the
subroutine return it after it was updated. This adds time and memory
overheads, so it may not be good idea if the variable can be very
large, or if speed of execution is an issue.
Don't rely on the fact that the variable is small during the
development of the application, it can grow quite big in situations
you don't expect. For example, a very simple HTML form text entry
field can return a few megabytes of data if one of your users is bored
and wants to test how good your code is. It's not uncommon to see
users copy-and-paste 10Mb core dump files into a form's text fields
and then submit it for your script to process.
multirun3.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
$counter = increment_counter($counter);
$counter = increment_counter($counter);
sub increment_counter{
my $counter = shift;
$counter++;
print "Counter is equal to $counter !\n";
return $counter;
}
} # end of sub run
Finally, you can use references to do the job. The version of
increment_counter() below accepts a reference to the C<$counter>
variable and increments its value after first dereferencing it. When
you use a reference, the variable you use inside the function is
physically the same bit of memory as the one outside the function.
This technique is often used to enable a called function to modify
variables in a calling function.
multirun4.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter(\$counter);
increment_counter(\$counter);
sub increment_counter{
my $r_counter = shift;
$$r_counter++;
print "Counter is equal to $$r_counter !\n";
}
} # end of sub run
Here is yet another and more obscure reference usage. We modify the
value of C<$counter> inside the subroutine by using the fact that
variables in C<@_> are aliases for the actual scalar parameters. Thus
if you called a function with two arguments, those would be stored in
C<$_[0]> and C<$_[1]>. In particular, if an element C<$_[0]> is
updated, the corresponding argument is updated (or an error occurs if
it is not updatable as would be the case of calling the function with
a literal, e.g. I<increment_counter(5)>).
multirun5.pl
-----------
#!/usr/bin/perl -w
use strict;
for (1..3){
print "run: [time $_]\n";
run();
}
sub run {
my $counter = 0;
increment_counter($counter);
increment_counter($counter);
sub increment_counter{
$_[0]++;
print "Counter is equal to $_[0] !\n";
}
} # end of sub run
The approach given above should be properly documented of course.
Here is a solution that avoids the problem entirely by splitting the
code into two files; the first is really just a wrapper and loader,
the second file contains the heart of the code.
multirun6.pl
-----------
#!/usr/bin/perl -w
use strict;
require 'multirun6-lib.pl' ;
for (1..3){
print "run: [time $_]\n";
run();
}
Separate file:
multirun6-lib.pl
----------------
use strict ;
my $counter;
sub run {
$counter = 0;
increment_counter();
increment_counter();
}
sub increment_counter{
$counter++;
print "Counter is equal to $counter !\n";
}
1 ;
Now you have at least six workarounds to choose from.
For more information please refer to perlref and perlsub manpages.
=head1 use(), require(), do(), %INC and @INC Explained
=head2 The @INC array
C<@INC> is a special Perl variable which is the equivalent of the
shell's C<PATH> variable. Whereas C<PATH> contains a list of
directories to search for executables, C<@INC> contains a list of
directories from which Perl modules and libraries can be loaded.
When you use(), require() or do() a filename or a module, Perl gets a
list of directories from the C<@INC> variable and searches them for
the file it was requested to load. If the file that you want to load
is not located in one of the listed directories, you have to tell Perl
where to find the file. You can either provide a path relative to one
of the directories in C<@INC>, or you can provide the full path to the
file.
=head2 The %INC hash
C<%INC> is another special Perl variable that is used to cache the
names of the files and the modules that were successfully loaded and
compiled by use(), require() or do() statements. Before attempting to
load a file or a module with use() or require(), Perl checks whether
it's already in the C<%INC> hash. If it's there, the loading and
therefore the compilation are not performed at all. Otherwise the file
is loaded into memory and an attempt is made to compile it. do() does
unconditional loading--no lookup in the C<%INC> hash is made.
If the file is successfully loaded and compiled, a new key-value pair
is added to C<%INC>. The key is the name of the file or module as it
was passed to the one of the three functions we have just mentioned,
and if it was found in any of the C<@INC> directories except C<".">
the value is the full path to it in the file system.
The following examples will make it easier to understand the logic.
First, let's see what are the contents of C<@INC> on my system:
% perl -e 'print join "\n", @INC'
/usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005
.
Notice the C<.> (current directory) is the last directory in the list.
Now let's load the module C<strict.pm> and see the contents of C<%INC>:
% perl -e 'use strict; print map {"$_ => $INC{$_}\n"} keys %INC'
strict.pm => /usr/lib/perl5/5.00503/strict.pm
Since C<strict.pm> was found in I</usr/lib/perl5/5.00503/> directory
and I</usr/lib/perl5/5.00503/> is a part of C<@INC>, C<%INC> includes
the full path as the value for the key C<strict.pm>.
Now let's create the simplest module in C</tmp/test.pm>:
test.pm
-------
1;
It does nothing, but returns a true value when loaded. Now let's load
it in different ways:
% cd /tmp
% perl -e 'use test; print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Since the file was found relative to C<.> (the current directory), the
relative path is inserted as the value. If we alter C<@INC>, by adding
I</tmp> to the end:
% cd /tmp
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => test.pm
Here we still get the relative path, since the module was found first
relative to C<".">. The directory I</tmp> was placed after C<.> in the
list. If we execute the same code from a different directory, the
C<"."> directory won't match,
% cd /
% perl -e 'BEGIN{push @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
so we get the full path. We can also prepend the path with unshift(),
so it will be used for matching before C<"."> and therefore we will
get the full path as well:
% cd /tmp
% perl -e 'BEGIN{unshift @INC, "/tmp"} use test; \
print map {"$_ => $INC{$_}\n"} keys %INC'
test.pm => /tmp/test.pm
The code:
BEGIN{unshift @INC, "/tmp"}
can be replaced with the more elegant:
use lib "/tmp";
Which is almost equivalent to our C<BEGIN> block and is the
recommended approach.
These approaches to modifying C<@INC> can be labor intensive, since
if you want to move the script around in the file-system you have to
modify the path. This can be painful, for example, when you move your
scripts from development to a production server.
There is a module called C<FindBin> which solves this problem in the
plain Perl world, but unfortunately it won't work under mod_perl,
since it's a module and as any module it's loaded only once. So the
first script using it will have all the settings correct, but the rest
of the scripts will not if located in a different directory from the
first.
For the sake of completeness, I'll present this module anyway.
If you use this module, you don't need to write a hard coded path. The
following snippet does all the work for you (the file is
I</tmp/load.pl>):
load.pl
-------
#!/usr/bin/perl
use FindBin ();
use lib "$FindBin::Bin";
use test;
print "test.pm => $INC{'test.pm'}\n";
In the above example C<$FindBin::Bin> is equal to I</tmp>. If we move
the script somewhere else... e.g. I</tmp/new_dir> in the code above
C<$FindBin::Bin> equals I</tmp/new_dir>.
% /tmp/load.pl
test.pm => /tmp/test.pm
This is just like C<use lib> except that no hard coded path is
required.
You can use this workaround to make it work under mod_perl.
do 'FindBin.pm';
unshift @INC, "$FindBin::Bin";
require test;
#maybe test::import( ... ) here if need to import stuff
This has a slight overhead because it will load from disk and
recompile the C<FindBin> module on each request. So it may not be
worth it.
=head2 Modules, Libraries and Program Files
Before we proceed, let's define what we mean by I<module>,
I<library> and I<program file>.
=over
=item * Libraries
These are files which contain Perl subroutines and other code.
When these are used to break up a large program into manageable chunks
they don't generally include a package declaration; when they are used
as subroutine libraries they often do have a package declaration.
Their last statement returns true, a simple C<1;> statement ensures
that.
They can be named in any way desired, but generally their extension is
I<.pl>.
Examples:
config.pl
----------
# No package so defaults to main::
$dir = "/home/httpd/cgi-bin";
$cgi = "/cgi-bin";
1;
mysubs.pl
----------
# No package so defaults to main::
sub print_header{
print "Content-type: text/plain\r\n\r\n";
}
1;
web.pl
------------
package web ;
# Call like this: web::print_with_class('loud',"Don't shout!");
sub print_with_class{
my( $class, $text ) = @_ ;
print qq{<span class="$class">$text</span>};
}
1;
=item * Modules
A file which contains perl subroutines and other code.
It generally declares a package name at the beginning of it.
Modules are generally used either as function libraries (which I<.pl>
files are still but less commonly used for), or as object libraries
where a module is used to define a class and its methods.
Its last statement returns true.
The naming convention requires it to have a I<.pm> extension.
Example:
MyModule.pm
-----------
package My::Module;
$My::Module::VERSION = 0.01;
sub new{ return bless {}, shift;}
END { print "Quitting\n"}
1;
=item * Program Files
Many Perl programs exist as a single file. Under Linux and other
Unix-like operating systems the file often has no suffix since the
operating system can determine that it is a perl script from the first
line (shebang line) or if it's Apache that executes the code, there is
a variety of ways to tell how and when the file should be executed.
Under Windows a suffix is normally used, for example C<.pl> or
C<.plx>.
The program file will normally C<require()> any libraries and C<use()>
any modules it requires for execution.
It will contain Perl code but won't usually have any package names.
Its last statement may return anything or nothing.
=back
=head2 require()
require() reads a file containing Perl code and compiles it. Before
attempting to load the file it looks up the argument in C<%INC> to see
whether it has already been loaded. If it has, require() just returns
without doing a thing. Otherwise an attempt will be made to load and
compile the file.
require() has to find the file it has to load. If the argument is a
full path to the file, it just tries to read it. For example:
require "/home/httpd/perl/mylibs.pl";
If the path is relative, require() will attempt to search for the file
in all the directories listed in C<@INC>. For example:
require "mylibs.pl";
If there is more than one occurrence of the file with the same name in
the directories listed in C<@INC> the first occurrence will be used.
The file must return I<TRUE> as the last statement to indicate
successful execution of any initialization code. Since you never know
what changes the file will go through in the future, you cannot be
sure that the last statement will always return I<TRUE>. That's why
the suggestion is to put "C<1;>" at the end of file.
Although you should use the real filename for most files, if the file
is a L<module|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files>, you may use the
following convention instead:
require My::Module;
This is equal to:
require "My/Module.pm";
If require() fails to load the file, either because it couldn't find
the file in question or the code failed to compile, or it didn't
return I<TRUE>, then the program would die(). To prevent this the
require() statement can be enclosed into an eval() exception-handling
block, as in this example:
require.pl
----------
#!/usr/bin/perl -w
eval { require "/file/that/does/not/exists"};
if ($@) {
print "Failed to load, because : $@"
}
print "\nHello\n";
When we execute the program:
% ./require.pl
Failed to load, because : Can't locate /file/that/does/not/exists in
@INC (@INC contains: /usr/lib/perl5/5.00503/i386-linux
/usr/lib/perl5/5.00503 /usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005 .) at require.pl line 3.
Hello
We see that the program didn't die(), because I<Hello> was
printed. This I<trick> is useful when you want to check whether a user
has some module installed, but if she hasn't it's not critical,
perhaps the program can run without this module with reduced
functionality.
If we remove the eval() part and try again:
require.pl
----------
#!/usr/bin/perl -w
require "/file/that/does/not/exists";
print "\nHello\n";
% ./require1.pl
Can't locate /file/that/does/not/exists in @INC (@INC contains:
/usr/lib/perl5/5.00503/i386-linux /usr/lib/perl5/5.00503
/usr/lib/perl5/site_perl/5.005/i386-linux
/usr/lib/perl5/site_perl/5.005 .) at require1.pl line 3.
The program just die()s in the last example, which is what you want in
most cases.
For more information refer to the perlfunc manpage.
=head2 use()
use(), just like require(), loads and compiles files containing Perl
code, but it works with
L<modules|general::perl_reference::perl_reference/Modules__Libraries_and_Program_Files> only and
is executed at compile time.
The only way to pass a module to load is by its module name and not
its filename. If the module is located in I<MyCode.pm>, the correct
way to use() it is:
use MyCode
and not:
use "MyCode.pm"
use() translates the passed argument into a file name replacing C<::>
with the operating system's path separator (normally C</>) and
appending I<.pm> at the end. So C<My::Module> becomes I<My/Module.pm>.
use() is exactly equivalent to:
BEGIN { require Module; Module->import(LIST); }
Internally it calls require() to do the loading and compilation
chores. When require() finishes its job, import() is called unless
C<()> is the second argument. The following pairs are equivalent:
use MyModule;
BEGIN {require MyModule; MyModule->import; }
use MyModule qw(foo bar);
BEGIN {require MyModule; MyModule->import("foo","bar"); }
use MyModule ();
BEGIN {require MyModule; }
The first pair exports the default tags. This happens if the module
sets C<@EXPORT> to a list of tags to be exported by default. The
module's manpage normally describes what tags are exported by
default.
The second pair exports only the tags passed as arguments.
The third pair describes the case where the caller does not want any
symbols to be imported.
C<import()> is not a builtin function, it's just an ordinary static
method call into the "C<MyModule>" package to tell the module to
import the list of features back into the current package. See the
Exporter manpage for more information.
When you write your own modules, always remember that it's better to
use C<@EXPORT_OK> instead of C<@EXPORT>, since the former doesn't
export symbols unless it was asked to. Exports pollute the namespace
of the module user. Also avoid short or common symbol names to reduce
the risk of name clashes.
When functions and variables aren't exported you can still access them
using their full names, like C<$My::Module::bar> or
C<$My::Module::foo()>. By convention you can use a leading underscore
on names to informally indicate that they are I<internal> and not for
public use.
There's a corresponding "C<no>" command that un-imports symbols
imported by C<use>, i.e., it calls C<Module-E<gt>unimport(LIST)>
instead of C<import()>.
=head2 do()
While do() behaves almost identically to require(), it reloads the
file unconditionally. It doesn't check C<%INC> to see whether the file
was already loaded.
If do() cannot read the file, it returns C<undef> and sets C<$!> to
report the error. If do() can read the file but cannot compile it, it
returns C<undef> and puts an error message in C<$@>. If the file is
successfully compiled, do() returns the value of the last expression
evaluated.
=head1 Using Global Variables and Sharing Them Between Modules/Packages
It helps when you code your application in a structured way, using the
perl packages, but as you probably know once you start using packages
it's much harder to share the variables between the various
packagings. A configuration package comes to mind as a good example of
the package that will want its variables to be accessible from the
other modules.
Of course using the Object Oriented (OO) programming is the best way
to provide an access to variables through the access methods. But if
you are not yet ready for OO techniques you can still benefit from
using the techniques we are going to talk about.
=head2 Making Variables Global
When you first wrote C<$x> in your code you created a (package) global
variable. It is visible everywhere in your program, although if used
in a package other than the package in which it was declared
(C<main::> by default), it must be referred to with its fully
qualified name, unless you have imported this variable with
import(). This will work only if you do not use C<strict> pragma; but
you I<have> to use this pragma if you want to run your scripts under
mod_perl. Read L<The strict
pragma|guide::porting/The_strict_pragma> to find out why.
=head2 Making Variables Global With strict Pragma On
First you use :
use strict;
Then you use:
use vars qw($scalar %hash @array);
This declares the named variables as package globals in the current
package. They may be referred to within the same file and package
with their unqualified names; and in different files/packages with
their fully qualified names.
With perl5.6 you can use the C<our> operator instead:
our($scalar, %hash, @array);
If you want to share package global variables between packages, here
is what you can do.
=head2 Using Exporter.pm to Share Global Variables
Assume that you want to share the C<CGI.pm> object (I will use C<$q>)
between your modules. For example, you create it in C<script.pl>, but
you want it to be visible in C<My::HTML>. First, you make C<$q>
global.
script.pl:
----------------
use vars qw($q);
use CGI;
use lib qw(.);
use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
$q = CGI->new;
My::HTML::printmyheader();
Note that we have imported C<$q> from C<My::HTML>. And C<My::HTML>
does the export of C<$q>:
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
}
1;
So the C<$q> is shared between the C<My::HTML> package and
C<script.pl>. It will work vice versa as well, if you create the
object in C<My::HTML> but use it in C<script.pl>. You have true
sharing, since if you change C<$q> in C<script.pl>, it will be changed
in C<My::HTML> as well.
What if you need to share C<$q> between more than two packages? For
example you want My::Doc to share C<$q> as well.
You leave C<My::HTML> untouched, and modify I<script.pl> to include:
use My::Doc qw($q);
Then you add the same C<Exporter> code that we used in C<My::HTML>,
into C<My::Doc>, so that it also exports C<$q>.
One possible pitfall is when you want to use C<My::Doc> in both
C<My::HTML> and I<script.pl>. Only if you add
use My::Doc qw($q);
into C<My::HTML> will C<$q> be shared. Otherwise C<My::Doc> will not
share C<$q> any more. To make things clear here is the code:
script.pl:
----------------
use vars qw($q);
use CGI;
use lib qw(.);
use My::HTML qw($q); # My/HTML.pm is in the same dir as script.pl
use My::Doc qw($q); # Ditto
$q = new CGI;
My::HTML::printmyheader();
My/HTML.pm
----------------
package My::HTML;
use strict;
BEGIN {
use Exporter ();
@My::HTML::ISA = qw(Exporter);
@My::HTML::EXPORT = qw();
@My::HTML::EXPORT_OK = qw($q);
}
use vars qw($q);
use My::Doc qw($q);
sub printmyheader{
# Whatever you want to do with $q... e.g.
print $q->header();
My::Doc::printtitle('Guide');
}
1;
My/Doc.pm
----------------
package My::Doc;
use strict;
BEGIN {
use Exporter ();
@My::Doc::ISA = qw(Exporter);
@My::Doc::EXPORT = qw();
@My::Doc::EXPORT_OK = qw($q);
}
use vars qw($q);
sub printtitle{
my $title = shift || 'None';
print $q->h1($title);
}
1;
=head2 Using the Perl Aliasing Feature to Share Global Variables
As the title says you can import a variable into a script or module
without using C<Exporter.pm>. I have found it useful to keep all the
configuration variables in one module C<My::Config>. But then I have
to export all the variables in order to use them in other modules,
which is bad for two reasons: polluting other packages' name spaces
with extra tags which increases the memory requirements; and adding
the overhead of keeping track of what variables should be exported
from the configuration module and what imported, for some particular
package. I solve this problem by keeping all the variables in one
hash C<%c> and exporting that. Here is an example of C<My::Config>:
package My::Config;
use strict;
use vars qw(%c);
%c = (
# All the configs go here
scalar_var => 5,
array_var => [qw(foo bar)],
hash_var => {
foo => 'Foo',
bar => 'BARRR',
},
);
1;
Now in packages that want to use the configuration variables I have
either to use the fully qualified names like C<$My::Config::test>,
which I dislike or import them as described in the previous section.
But hey, since we have only one variable to handle, we can make things
even simpler and save the loading of the C<Exporter.pm> package. We
will use the Perl aliasing feature for exporting and saving the
keystrokes:
package My::HTML;
use strict;
use lib qw(.);
# Global Configuration now aliased to global %c
use My::Config (); # My/Config.pm in the same dir as script.pl
use vars qw(%c);
*c = \%My::Config::c;
# Now you can access the variables from the My::Config
print $c{scalar_var};
print $c{array_var}[0];
print $c{hash_var}{foo};
Of course $c is global everywhere you use it as described above, and
if you change it somewhere it will affect any other packages you have
aliased C<$My::Config::c> to.
Note that aliases work either with global or C<local()> vars - you
cannot write:
my *c = \%My::Config::c; # ERROR!
Which is an error. But you can write:
local *c = \%My::Config::c;
For more information about aliasing, refer to the Camel book, second
edition, pages 51-52.
=head2 Using Non-Hardcoded Configuration Module Names
You have just seen how to use a configuration module for configuration
centralization and an easy access to the information stored in this
module. However, there is somewhat of a chicken-and-egg problem--how
to let your other modules know the name of this file? Hardcoding the
name is brittle--if you have only a single project it should be fine,
but if you have more projects which use different configurations and
you will want to reuse their code you will have to find all instances
of the hardcoded name and replace it.
Another solution could be to have the same name for a configuration
module, like C<My::Config> but putting a different copy of it into
different locations. But this won't work under mod_perl because of the
namespace collision. You cannot load different modules which uses the
same name, only the first one will be loaded.
Luckily, there is another solution which allows us to stay flexible.
C<PerlSetVar> comes to rescue. Just like with environment variables,
you can set server's global Perl variables which can be retrieved from
any module and script. Those statements are placed into the
I<httpd.conf> file. For example
PerlSetVar FooBaseDir /home/httpd/foo
PerlSetVar FooConfigModule Foo::Config
Now we require() the file where the above configuration will be used.
PerlRequire /home/httpd/perl/startup.pl
In the I<startup.pl> we might have the following code:
# retrieve the configuration module path
use Apache;
my $s = Apache->server;
my $base_dir = $s->dir_config('FooBaseDir') || '';
my $config_module = $s->dir_config('FooConfigModule') || '';
die "FooBaseDir and FooConfigModule aren't set in httpd.conf"
unless $base_dir and $config_module;
# build the real path to the config module
my $path = "$base_dir/$config_module";
$path =~ s|::|/|;
$path .= ".pm";
# we have something like "/home/httpd/foo/Foo/Config.pm"
# now we can pull in the configuration module
require $path;
Now we know the module name and it's loaded, so for example if we need
to use some variables stored in this module to open a database
connection, we will do:
Apache::DBI->connect_on_init
("DBI:mysql:${$config_module.'::DB_NAME'}::${$config_module.'::SERVER'}",
${$config_module.'::USER'},
${$config_module.'::USER_PASSWD'},
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
);
Where variable like:
${$config_module.'::USER'}
In our example are really:
$Foo::Config::USER
If you want to access these variable from within your code at the run
time, instead accessing to the server object C<$c>, use the request
object C<$r>:
my $r = shift;
my $base_dir = $r->dir_config('FooBaseDir') || '';
my $config_module = $r->dir_config('FooConfigModule') || '';
=head1 The Scope of the Special Perl Variables
Special Perl variables like C<$|> (buffering), C<$^T> (script's start
time), C<$^W> (warnings mode), C<$/> (input record separator), C<$\>
(output record separator) and many more are all true global variables;
they do not belong to any particular package (not even C<main::>) and
are universally available. This means that if you change them, you
change them anywhere across the entire program; furthermore you cannot
scope them with my(). However you can local()ise them which means that
any changes you apply will only last until the end of the enclosing
scope. In the mod_perl situation where the child server doesn't
usually exit, if in one of your scripts you modify a global variable
it will be changed for the rest of the process' life and will affect
all the scripts executed by the same process. Therefore localizing
these variables is highly recommended, I'd say mandatory.
We will demonstrate the case on the input record separator
variable. If you undefine this variable, the diamond operator
(readline) will suck in the whole file at once if you have enough
memory. Remembering this you should never write code like the example
below.
$/ = undef; # BAD!
open IN, "file" ....
# slurp it all into a variable
$all_the_file = <IN>;
The proper way is to have a local() keyword before the special
variable is changed, like this:
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
But there is a catch. local() will propagate the changed value to
the code below it. The modified value will be in effect until the
script terminates, unless it is changed again somewhere else in the
script.
A cleaner approach is to enclose the whole of the code that is
affected by the modified variable in a block, like this:
{
local $/ = undef;
open IN, "file" ....
# slurp it all inside a variable
$all_the_file = <IN>;
}
That way when Perl leaves the block it restores the original value of
the C<$/> variable, and you don't need to worry elsewhere in your
program about its value being changed here.
Note that if you call a subroutine after you've set a global variable
but within the enclosing block, the global variable will be visible
with its new value inside the subroutine.
=head1 Compiled Regular Expressions
When using a regular expression that contains an interpolated Perl
variable, if it is known that the variable (or variables) will not
change during the execution of the program, a standard optimization
technique is to add the C</o> modifier to the regex pattern. This
directs the compiler to build the internal table once, for the entire
lifetime of the script, rather than every time the pattern is
executed. Consider:
my $pat = '^foo$'; # likely to be input from an HTML form field
foreach( @list ) {
print if /$pat/o;
}
This is usually a big win in loops over lists, or when using the
C<grep()> or C<map()> operators.
In long-lived mod_perl scripts, however, the variable may change with
each invocation and this can pose a problem. The first invocation of a
fresh httpd child will compile the regex and perform the search
correctly. However, all subsequent uses by that child will continue to
match the original pattern, regardless of the current contents of the
Perl variables the pattern is supposed to depend on. Your script will
appear to be broken.
There are two solutions to this problem:
The first is to use C<eval q//>, to force the code to be evaluated
each time. Just make sure that the eval block covers the entire loop
of processing, and not just the pattern match itself.
The above code fragment would be rewritten as:
my $pat = '^foo$';
eval q{
foreach( @list ) {
print if /$pat/o;
}
}
Just saying:
foreach( @list ) {
eval q{ print if /$pat/o; };
}
means that we recompile the regex for every element in the list even
though the regex doesn't change.
You can use this approach if you require more than one pattern match
operator in a given section of code. If the section contains only one
operator (be it an C<m//> or C<s///>), you can rely on the property of the
null pattern, that reuses the last pattern seen. This leads to the
second solution, which also eliminates the use of eval.
The above code fragment becomes:
my $pat = '^foo$';
"something" =~ /$pat/; # dummy match (MUST NOT FAIL!)
foreach( @list ) {
print if //;
}
The only gotcha is that the dummy match that boots the regular
expression engine must absolutely, positively succeed, otherwise the
pattern will not be cached, and the C<//> will match everything. If you
can't count on fixed text to ensure the match succeeds, you have two
possibilities.
If you can guarantee that the pattern variable contains no
meta-characters (things like *, +, ^, $...), you can use the dummy
match:
$pat =~ /\Q$pat\E/; # guaranteed if no meta-characters present
If there is a possibility that the pattern can contain
meta-characters, you should search for the pattern or the non-searchable
\377 character as follows:
"\377" =~ /$pat|^\377$/; # guaranteed if meta-characters present
Another approach:
It depends on the complexity of the regex to which you apply this
technique. One common usage where a compiled regex is usually more
efficient is to "I<match any one of a group of patterns>" over and
over again.
Maybe with a helper routine, it's easier to remember. Here is one
slightly modified from Jeffery Friedl's example in his book
"I<Mastering Regular Expressions>".
#####################################################
# Build_MatchMany_Function
# -- Input: list of patterns
# -- Output: A code ref which matches its $_[0]
# against ANY of the patterns given in the
# "Input", efficiently.
#
sub Build_MatchMany_Function {
my @R = @_;
my $expr = join '||', map { "\$_[0] =~ m/\$R[$_]/o" } ( 0..$#R );
my $matchsub = eval "sub { $expr }";
die "Failed in building regex @R: $@" if $@;
$matchsub;
}
Example usage:
@some_browsers = qw(Mozilla Lynx MSIE AmigaVoyager lwp libwww);
$Known_Browser=Build_MatchMany_Function(@some_browsers);
while (<ACCESS_LOG>) {
# ...
$browser = get_browser_field($_);
if ( ! &$Known_Browser($browser) ) {
print STDERR "Unknown Browser: $browser\n";
}
# ...
}
And of course you can use the qr() operator which makes the code even
more efficient:
my $pat = '^foo$';
my $re = qr($pat);
foreach( @list ) {
print if /$re/o;
}
The qr() operator compiles the pattern for each request and then use
the compiled version in the actual match.
=head1 Exception Handling for mod_perl
Here are some guidelines for S<clean(er)> exception handling in
mod_perl, although the technique presented can be applied to all of
your Perl programming.
The reasoning behind this document is the current broken status of
C<$SIG{__DIE__}> in the perl core - see both the perl5-porters and the
mod_perl mailing list archives for details on this discussion. (It's
broken in at least Perl v5.6.0 and probably in later versions as
well). In short summary, $SIG{__DIE__} is a little bit too global, and
catches exceptions even when you want to catch them yourself, using
an C<eval{}> block.
=head2 Trapping Exceptions in Perl
To trap an exception in Perl we use the C<eval{}> construct. Many
people initially make the mistake that this is the same as the C<eval
EXPR> construct, which compiles and executes code at run time, but
that's not the case. C<eval{}> compiles at compile time, just like the
rest of your code, and has next to zero run-time penalty. For the
hardcore C programmers among you, it uses the C<setjmp/longjmp> POSIX
routines internally, just like C++ exceptions.
When in an eval block, if the code being executed die()'s for any
reason, an exception is thrown. This exception can be caught by
examining the C<$@> variable immediately after the eval block; if
C<$@> is true then an exception occurred and C<$@> contains the
exception in the form of a string. The full construct looks like
this:
eval {
# Some code here
}; # Note important semi-colon there
if ($@) # $@ contains the exception that was thrown
{
# Do something with the exception
}
else # optional
{
# No exception was thrown
}
Most of the time when you see these exception handlers there is no
else block, because it tends to be OK if the code didn't throw an
exception.
Perl's exception handling is similar to that of other languages, though it may
not seem so at first sight:
Perl Other language
------------------------------- ------------------------------------
eval { try {
# execute here // execute here
# raise our own exception: // raise our own exception:
die "Oops" if /error/; if(error==1){throw Exception.Oops;}
# execute more // execute more
} ; }
if($@) { catch {
# handle exceptions switch( Exception.id ) {
if( $@ =~ /Fail/ ) { Fail : fprintf( stderr, "Failed\n" ) ;
print "Failed\n" ; break ;
}
elsif( $@ =~ /Oops/ ) { Oops : throw Exception ;
# Pass it up the chain
die if $@ =~ /Oops/;
}
else { default :
# handle all other }
# exceptions here }
} // If we got here all is OK or handled
}
else { # optional
# all is well
}
# all is well or has been handled
=head2 Alternative Exception Handling Techniques
An often suggested method for handling global exceptions in mod_perl,
and other perl programs in general, is a B<__DIE__> handler, which can
be set up by either assigning a function name as a string to
C<$SIG{__DIE__}> (not particularly recommended, because of the
possible namespace clashes) or assigning a code reference to
C<$SIG{__DIE__}>. The usual way of doing so is to use an anonymous
subroutine:
$SIG{__DIE__} = sub { print "Eek - we died with:\n", $_[0]; };
The current problem with this is that C<$SIG{__DIE__}> is a global
setting in your script, so while you can potentially hide away your
exceptions in some external module, the execution of C<$SIG{__DIE__}>
is fairly magical, and interferes not just with your code, but with
all code in every module you import. Beyond the magic involved,
C<$SIG{__DIE__}> actually interferes with perl's normal exception
handling mechanism, the C<eval{}> construct. Witness:
$SIG{__DIE__} = sub { print "handler\n"; };
eval {
print "In eval\n";
die "Failed for some reason\n";
};
if ($@) {
print "Caught exception: $@";
}
The code unfortunately prints out:
In eval
handler
Which isn't quite what you would expect, especially if that
C<$SIG{__DIE__}> handler is hidden away deep in some other module that
you didn't know about. There are work arounds however. One is to
localize C<$SIG{__DIE__}> in every exception trap you write:
eval {
local $SIG{__DIE__};
...
};
Obviously this just doesn't scale - you don't want to be doing that
for every exception trap in your code, and it's a slow down. A second
work around is to check in your handler if you are trying to catch
this exception:
$SIG{__DIE__} = sub {
die $_[0] if $^S;
print "handler\n";
};
However this won't work under C<Apache::Registry> - you're always in
an eval block there!
C<$^S> isn't totally reliable in certain Perl versions. e.g. 5.005_03
and 5.6.1 both do the wrong thing with it in certain situations.
Instead, you use can use the caller() function to figure out if we are
called in the eval() context:
$SIG{__DIE__} = sub {
my $in_eval = 0;
for(my $stack = 1; my $sub = (CORE::caller($stack))[3]; $stack++) {
$in_eval = 1 if $sub =~ /^\(eval\)/;
}
my_die_handler(@_) unless $in_eval;
};
The other problem with C<$SIG{__DIE__}> also relates to its global
nature. Because you might have more than one application running
under mod_perl, you can't be sure which has set a C<$SIG{__DIE__}>
handler when and for what. This can become extremely confusing when
you start scaling up from a set of simple registry scripts that might
rely on CGI::Carp for global exception handling (which uses
C<$SIG{__DIE__}> to trap exceptions) to having many applications
installed with a variety of exception handling mechanisms in place.
You should warn people about this danger of C<$SIG{__DIE__}> and
inform them of better ways to code. The following material is an
attempt to do just that.
=head2 Better Exception Handling
The C<eval{}> construct in itself is a fairly weak way to handle
exceptions as strings. There's no way to pass more information in your
exception, so you have to handle your exception in more than one place
- at the location the error occurred, in order to construct a sensible
error message, and again in your exception handler to de-construct
that string into something meaningful (unless of course all you want
your exception handler to do is dump the error to the browser). The
other problem is that you have no way of automatically detecting where
the exception occurred using C<eval{}> construct. In a
C<$SIG{__DIE__}> block you always have the use of the caller()
function to detect where the error occurred. But we can fix that...
A little known fact about exceptions in perl 5.005 is that you can
call die with an object. The exception handler receives that object in
C<$@>. This is how you are advised to handle exceptions now, as it
provides an extremely flexible and scalable exceptions solution,
potentially providing almost all of the power Java exceptions.
[As a footnote here, the only thing that is really missing here from
Java exceptions is a guaranteed Finally clause, although its possible
to get about 98.62% of the way towards providing that using
C<eval{}>.]
=head3 A Little Housekeeping
First though, before we delve into the details, a little housekeeping
is in order. Most, if not all, mod_perl programs consist of a main
routine that is entered, and then dispatches itself to a routine
depending on the parameters passed and/or the form values. In a normal
C program this is your main() function, in a mod_perl handler this is
your handler() function/method. The exception to this rule seems to be
Apache::Registry scripts, although the techniques described here can
be easily adapted.
In order for you to be able to use exception handling to its best
advantage you need to change your script to have some sort of global
exception handling. This is much more trivial than it sounds. If
you're using C<Apache::Registry> to emulate CGI you might consider
wrapping your entire script in one big eval block, but I would
discourage that. A better method would be to modularize your script
into discrete function calls, one of which should be a dispatch
routine:
#!/usr/bin/perl -w
# Apache::Registry script
eval {
dispatch();
};
if ($@) {
# handle exception
}
sub dispatch {
...
}
This is easier with an ordinary mod_perl handler as it is natural to
have separate functions, rather than a long run-on script:
MyHandler.pm
------------
sub handler {
my $r = shift;
eval {
dispatch($r);
};
if ($@) {
# handle exception
}
}
sub dispatch {
my $r = shift;
...
}
Now that the skeleton code is setup, let's create an exception class,
making use of Perl 5.005's ability to throw exception objects.
=head3 An Exception Class
This is a really simple exception class, that does nothing but contain
information. A better implementation would probably also handle its
own exception conditions, but that would be more complex, requiring
separate packages for each exception type.
My/Exception.pm
---------------
package My::Exception;
sub AUTOLOAD {
no strict 'refs', 'subs';
if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
my $exception = $1;
*{$AUTOLOAD} =
sub {
shift;
my ($package, $filename, $line) = caller;
push @_, caller => {
package => $package,
filename => $filename,
line => $line,
};
bless { @_ }, "My::Exception::$exception";
};
goto &{$AUTOLOAD};
}
else {
die "No such exception class: $AUTOLOAD\n";
}
}
1;
OK, so this is all highly magical, but what does it do? It creates a
simple package that we can import and use as follows:
use My::Exception;
die My::Exception->SomeException( foo => "bar" );
The exception class tracks exactly where we died from using the
caller() mechanism, it also caches exception classes so that
C<AUTOLOAD> is only called the first time (in a given process) an
exception of a particular type is thrown (particularly relevant under
mod_perl).
=head2 Catching Uncaught Exceptions
What about exceptions that are thrown outside of your control? We can
fix this using one of two possible methods. The first is to override
die globally using the old magical C<$SIG{__DIE__}>, and the second,
is the cleaner non-magical method of overriding the global die()
method to your own die() method that throws an exception that makes
sense to your application.
=head3 Using $SIG{__DIE__}
Overloading using C<$SIG{__DIE__}> in this case is rather simple,
here's some code:
$SIG{__DIE__} = sub {
if(!ref($_[0])) {
$err = My::Exception->UnCaught(text => join('', @_));
}
die $err;
};
All this does is catch your exception and re-throw it. It's not as
dangerous as we stated earlier that C<$SIG{__DIE__}> can be, because
we're actually re-throwing the exception, rather than catching it and
stopping there. Even though $SIG{__DIE__} is a global handler, because
we are simply re-throwing the exception we can let other applications
outside of our control simply catch the exception and not worry about
it.
There's only one slight buggette left, and that's if some external
code die()'ing catches the exception and tries to do string
comparisons on the exception, as in:
eval {
... # some code
die "FATAL ERROR!\n";
};
if ($@) {
if ($@ =~ /^FATAL ERROR/) {
die $@;
}
}
In order to deal with this, we can overload stringification for our
C<My::Exception::UnCaught> class:
{
package My::Exception::UnCaught;
use overload '""' => \&str;
sub str {
shift->{text};
}
}
We can now let other code happily continue. Note that there is a bug in
Perl 5.6 which may affect people here: Stringification does not occur
when an object is operated on by a regular expression (via the =~ operator).
A work around is to explicitly stringify using qq double quotes, however
that doesn't help the poor soul who is using other applications. This bug
has been fixed in later versions of Perl.
=head3 Overriding the Core die() Function
So what if we don't want to touch C<$SIG{__DIE__}> at all? We can
overcome this by overriding the core die function. This is slightly
more complex than implementing a C<$SIG{__DIE__}> handler, but is far
less magical, and is the right thing to do, according to the
L<perl5-porters mailing list|guide::help/Get_help_with_Perl>.
Overriding core functions has to be done from an external
package/module. So we're going to add that to our C<My::Exception>
module. Here's the relevant parts:
use vars qw/@ISA @EXPORT/;
use Exporter;
@EXPORT = qw/die/;
@ISA = 'Exporter';
sub die (@); # prototype to match CORE::die
sub import {
my $pkg = shift;
$pkg->export('CORE::GLOBAL', 'die');
Exporter::import($pkg,@_);
}
sub die (@) {
if (!ref($_[0])) {
CORE::die My::Exception->UnCaught(text => join('', @_));
}
CORE::die $_[0]; # only use first element because its an object
}
That wasn't so bad, was it? We're relying on Exporter's export()
function to do the hard work for us, exporting the die() function into
the C<CORE::GLOBAL> namespace. If we don't want to overload die() everywhere
this can still be an extremely useful technique. By just using Exporter's
default import() method we can export our new die() method into any package
of our choosing. This allows us to short-cut the long calling convention
and simply die() with a string, and let the system handle the actual
construction into an object for us.
Along with the above overloaded stringification, we now have a complete
exception system (well, mostly complete. Exception die-hards would argue that
there's no "finally" clause, and no exception stack, but that's another topic
for another time).
=head2 A Single UnCaught Exception Class
Until the Perl core gets its own base exception class (which will likely happen
for Perl 6, but not sooner), it is vitally important that you decide upon a
single base exception class for all of the applications that you install on
your server, and a single exception handling technique. The problem comes when
you have multiple applications all doing exception handling and all expecting a
certain type of "UnCaught" exception class. Witness the following application:
package Foo;
eval {
# do something
}
if ($@) {
if ($@->isa('Foo::Exception::Bar')) {
# handle "Bar" exception
}
elsif ($@->isa('Foo::Exception::UnCaught')) {
# handle uncaught exceptions
}
}
All will work well until someone installs application "TrapMe" on the
same machine, which installs its own UnCaught exception handler,
overloading CORE::GLOBAL::die or installing a $SIG{__DIE__} handler.
This is actually a case where using $SIG{__DIE__} might actually be
preferable, because you can change your handler() routine to look like
this:
sub handler {
my $r = shift;
local $SIG{__DIE__};
Foo::Exception->Init(); # sets $SIG{__DIE__}
eval {
dispatch($r);
};
if ($@) {
# handle exception
}
}
sub dispatch {
my $r = shift;
...
}
In this case the very nature of $SIG{__DIE__} being a lexical variable
has helped us, something we couldn't fix with overloading
CORE::GLOBAL::die. However there is still a gotcha. If someone has
overloaded die() in one of the applications installed on your mod_perl
machine, you get the same problems still. So in short: Watch out, and
check the source code of anything you install to make sure it follows
your exception handling technique, or just uses die() with strings.
=head2 Some Uses
I'm going to come right out and say now: I abuse this system horribly!
I throw exceptions all over my code, not because I've hit an
"exceptional" bit of code, but because I want to get straight back out
of the current call stack, without having to have every single level of
function call check error codes. One way I use this is to return
Apache return codes:
# paranoid security check
die My::Exception->RetCode(code => 204);
Returns a 204 error code (C<HTTP_NO_CONTENT>), which is caught at my
top level exception handler:
if ($@->isa('My::Exception::RetCode')) {
return $@->{code};
}
That last return statement is in my handler() method, so that's the
return code that Apache actually sends. I have other exception
handlers in place for sending Basic Authentication headers and
Redirect headers out. I also have a generic C<My::Exception::OK>
class, which gives me a way to back out completely from where I am,
but register that as an OK thing to do.
Why do I go to these extents? After all, code like slashcode (the code
behind http://slashdot.org) doesn't need this sort of thing, so why
should my web site? Well it's just a matter of scalability and
programmer style really. There's a lot of literature out there about
exception handling, so I suggest doing some research.
=head2 Conclusions
Here I've demonstrated a simple and scalable (and useful) exception
handling mechanism, that fits perfectly with your current code, and
provides the programmer with an excellent means to determine what has
happened in his code. Some users might be worried about the overhead
of such code. However in use I've found accessing the database to be a
much more significant overhead, and this is used in some code
delivering to thousands of users.
For similar exception handling techniques, see the section "L<Other
Implementations|general::perl_reference::perl_reference/Other_Implementations>".
=head2 The My::Exception class in its entirety
package My::Exception;
use vars qw/@ISA @EXPORT $AUTOLOAD/;
use Exporter;
@ISA = 'Exporter';
@EXPORT = qw/die/;
sub die (@);
sub import {
my $pkg = shift;
# allow "use My::Exception 'die';" to mean import locally only
$pkg->export('CORE::GLOBAL', 'die') unless @_;
Exporter::import($pkg,@_);
}
sub die (@) {
if (!ref($_[0])) {
CORE::die My::Exception->UnCaught(text => join('', @_));
}
CORE::die $_[0];
}
{
package My::Exception::UnCaught;
use overload '""' => sub { shift->{text} } ;
}
sub AUTOLOAD {
no strict 'refs', 'subs';
if ($AUTOLOAD =~ /.*::([A-Z]\w+)$/) {
my $exception = $1;
*{$AUTOLOAD} =
sub {
shift;
my ($package, $filename, $line) = caller;
push @_, caller => {
package => $package,
filename => $filename,
line => $line,
};
bless { @_ }, "My::Exception::$exception";
};
goto &{$AUTOLOAD};
}
else {
CORE::die "No such exception class: $AUTOLOAD\n";
}
}
1;
=head2 Other Implementations
Some users might find it very useful to have the more C++/Java like
interface of try/catch functions. These are available in several forms
that all work in slightly different ways. See the documentation for
each module for details:
=over
=item * Error.pm
Graham Barr's excellent OO styled "try, throw, catch" module (from
L<CPAN|download::third_party/Perl>). This should be considered your best option
for structured exception handling because it is well known and well
supported and used by a lot of other applications.
=item * Exception::Class and Devel::StackTrace
by Dave Rolsky both available from CPAN of course.
C<Exception::Class> is a bit cleaner than the C<AUTOLOAD> method from
above as it can catch typos in exception class names, whereas the
method above will automatically create a new class for you. In
addition, it lets you create actual class hierarchies for your
exceptions, which can be useful if you want to create exception
classes that provide extra methods or data. For example, an exception
class for database errors could provide a method for returning the SQL
and bound parameters in use at the time of the error.
=item * Try.pm
Tony Olekshy's. Adds an unwind stack and some other interesting
features. Not on the CPAN. Available at
http://www.avrasoft.com/perl/rfc/try-1136.zip
=back
=head1 Customized __DIE__ hanlder
As we saw in the previous sections it's a bad idea to do:
require Carp;
$SIG{__DIE__} = \&Carp::confess;
since it breaks the error propogations within eval {} blocks,. But
starting from perl 5.6.x you can use another solution to trace
errors. For example you get an error:
"exit" is not exported by the GLOB(0x88414cc) module at (eval 397) line 1
and you have no clue where it comes from, you can override the exit()
function and plug the tracer inside:
require Carp;
use subs qw(CORE::GLOBAL::die);
*CORE::GLOBAL::die = sub {
if ($_[0] =~ /"exit" is not exported/){
local *CORE::GLOBAL::die = sub { CORE::die(@_) };
Carp::confess(@_); # Carp uses die() internally!
} else {
CORE::die(@_); # could write &CORE::die to forward @_
}
};
Now we can test that it works properly without breaking the eval {}
blocks error propogation:
eval { foo(); }; warn $@ if $@;
print "\n";
eval { poo(); }; warn $@ if $@;
sub foo{ bar(); }
sub bar{ die qq{"exit" is not exported}}
sub poo{ tar(); }
sub tar{ die "normal exit"}
prints:
$ perl -w test
Subroutine die redefined at test line 5.
"exit" is not exported at test line 6
main::__ANON__('"exit" is not exported') called at test line 17
main::bar() called at test line 16
main::foo() called at test line 12
eval {...} called at test line 12
normal exit at test line 5.
the 'local' in:
local *CORE::GLOBAL::die = sub { CORE::die(@_) };
is important, so you won't lose the overloaded C<CORE::GLOBAL::die>.
=head1 Maintainers
Maintainer is the person(s) you should contact with updates,
corrections and patches.
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=back
=head1 Authors
=over
=item *
Stas Bekman E<lt>stas (at) stason.orgE<gt>
=item *
Matt Sergeant E<lt>matt (at) sergeant.orgE<gt>
=back
Only the major authors are listed above. For contributors see the
Changes file.
=cut
---------------------------------------------------------------------
To unsubscribe, e-mail: docs-cvs-unsubscribe@perl.apache.org
For additional commands, e-mail: docs-cvs-help@perl.apache.org