You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl-cvs@perl.apache.org by sb...@hyperreal.org on 1999/05/08 20:04:25 UTC
cvs commit: modperl-site/guide browserbugs.html databases.html dbm.html strategy.html
sbekman 99/05/08 11:04:25
Added: guide browserbugs.html databases.html dbm.html
strategy.html
Log:
forgot to cvs add new files...
Revision Changes Path
1.1 modperl-site/guide/browserbugs.html
Index: browserbugs.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>mod_perl guide: Workarounds for some known bugs in browsers.</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; AIX 4.1) [Netscape]">
<META NAME="Author" CONTENT="Bekman Stas">
<META NAME="Description" CONTENT="All Apache/Perl related information: Hints,
Guidelines, Scenarios and Troubleshottings">
<META NAME="keywords" CONTENT="mod_perl modperl perl apache cgi webserver speed
fast guide mod_perl apache guide help info faq mod_perl installation cgi
troubleshooting help no sex speedup free open source OSS mod_perl apache guide">
</HEAD>
<LINK REL=STYLESHEET TYPE="text/css"
HREF="style.css" TITLE="refstyle">
<style type="text/css">
<!--
@import url(style.css);
-->
</style>
<BODY TEXT="#000000" BGCOLOR="#E0FFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
<A NAME="toc"></A>
<H1 ALIGN=CENTER>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=LEFT></A>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=RIGHT></A>
Workarounds for some known bugs in browsers.</H1>
<HR WIDTH="100%">
[ <A HREF="debug.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="modules.html">Next</A> ]<HR><!-- INDEX BEGIN -->
<P><B><FONT SIZE=-1>Table of Contents:</FONT></B></P>
<UL>
<LI><A HREF="#Preventing_QUERY_STRING_to_get_c">Preventing QUERY_STRING to get corrupted with &entity key names.</A>
<LI><A HREF="#IE_4_x_does_not_re_post_data_to_">IE 4.x does not re-post data to a non-port-80 URL</A>
</UL>
<!-- INDEX END -->
<HR>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<P>
<CENTER><H1><A NAME="Preventing_QUERY_STRING_to_get_c">Preventing QUERY_STRING to get corrupted with &entity key names.</A></H1></CENTER>
<P>
In URL <CODE>http://my.site.com/foo.pl?foo=bar&reg=foobar</CODE> , some browsers will interpret <CODE>&reg</CODE> as a magic entity, and encode it as it was
<CODE>&reg;</CODE>, which will result in a corrupted <CODE>QUERY_STRING</CODE>. If you encounter this problem you should either avoid using such a keys
or separate parameter pairs with <CODE>;</CODE> instead of <CODE>&</CODE>. Both <CODE>CGI.pm</CODE> and
<CODE>Apache::Request</CODE> support a semicolon instead of an ampersand as a separator. So your URI
should look like:
<CODE>http://my.site.com/foo.pl?foo=bar;reg=foobar</CODE>.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="IE_4_x_does_not_re_post_data_to_">IE 4.x does not re-post data to a non-port-80 URL</A></H1></CENTER>
<P>
One problem with publishing 8080 port numbers is that I was told that IE
4.x has a bug when re-posting data to a non-port-80 url. It drops the port
designator, and uses port 80 anyway.
<P>
See <A HREF="././config.html#Publishing_port_numbers_differen">Publishing port numbers different from 80</A>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
[ <A HREF="debug.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="modules.html">Next</A> ]
<CENTER><TABLE CELLSPACING=2 CELLPADDING=2 WIDTH="100%" >
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER COLSPAN="3">
<HR>
</TD>
</TR>
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER>
<B>
<FONT SIZE=-1>
Written by <A HREF="help.html#This_document_s_Author">Stas Bekman</A>.
<BR>Last Modified at 05/08/1999
</FONT>
</B>
</TD>
<TD>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl2.jpg" ALT="Mod Perl Icon" BORDER=0 HEIGHT=59 WIDTH=150></A>
</TD>
<TD>
<FONT SIZE=-2>
Use of the Camel for Perl is <BR>
a trademark of <A HREF="http://www.ora.com">O'Reilly & Associates</A>,<BR>
and is used by permission.
</FONT>
</TD>
</TR>
</TABLE></CENTER>
</BODY>
</HTML>
1.1 modperl-site/guide/databases.html
Index: databases.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>mod_perl guide: mod_perl and Relational Databases</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; AIX 4.1) [Netscape]">
<META NAME="Author" CONTENT="Bekman Stas">
<META NAME="Description" CONTENT="All Apache/Perl related information: Hints,
Guidelines, Scenarios and Troubleshottings">
<META NAME="keywords" CONTENT="mod_perl modperl perl apache cgi webserver speed
fast guide mod_perl apache guide help info faq mod_perl installation cgi
troubleshooting help no sex speedup free open source OSS mod_perl apache guide">
</HEAD>
<LINK REL=STYLESHEET TYPE="text/css"
HREF="style.css" TITLE="refstyle">
<style type="text/css">
<!--
@import url(style.css);
-->
</style>
<BODY TEXT="#000000" BGCOLOR="#E0FFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
<A NAME="toc"></A>
<H1 ALIGN=CENTER>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=LEFT></A>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=RIGHT></A>
mod_perl and Relational Databases</H1>
<HR WIDTH="100%">
[ <A HREF="warnings.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="dbm.html">Next</A> ]<HR><!-- INDEX BEGIN -->
<P><B><FONT SIZE=-1>Table of Contents:</FONT></B></P>
<UL>
<LI><A HREF="#Why_Relational_SQL_Databases">Why Relational (SQL) Databases</A>
<LI><A HREF="#Apache_DBI_Initiate_a_persist">Apache::DBI - Initiate a persistent database connection</A>
<UL>
<LI><A HREF="#Introduction">Introduction</A>
<LI><A HREF="#Configuration">Configuration</A>
<LI><A HREF="#Preopening_DBI_connections">Preopening DBI connections</A>
<LI><A HREF="#Debugging_Apache_DBI">Debugging Apache::DBI</A>
<LI><A HREF="#Problems_and_solutions">Problems and solutions</A>
<UL>
<LI><A HREF="#The_morning_bug">The morning bug</A>
<LI><A HREF="#Cannot_find_the_DBI_handler">Cannot find the DBI handler</A>
<LI><A HREF="#Apache_DBI_does_not_work">Apache:DBI does not work</A>
</UL>
<LI><A HREF="#Some_useful_code_snippets_to_be_">Some useful code snippets to be used with relational Databases</A>
<UL>
<LI><A HREF="#Turning_the_SQL_queries_writing_">Turning the SQL queries writing into an short and simple task</A>
<LI><A HREF="#My_DB_module">My::DB module</A>
<LI><A HREF="#My_DB_Module_s_Usage_Examples">My::DB Module's Usage Examples</A>
</UL>
</UL>
</UL>
<!-- INDEX END -->
<HR>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<P>
<CENTER><H1><A NAME="Why_Relational_SQL_Databases">Why Relational (SQL) Databases</A></H1></CENTER>
<P>
Nowadays millions of users surf the Internet. There are millions of
Terabytes of data laying around. To manipulate that data new smart
techniques and technologies were invented. One of the major inventions was
a relational database, which allows to search and modify huge data storages
in zero time. It uses <STRONG>SQL</STRONG> (Structured Query Language) to manipulate contents of these databases. Of
course once we started to use the web, we have found a need to write web
interfaces to these databases and CGI was and is the mostly used technology
for building such interfaces. The main limitation for a CGI script driving
a database versus application, is its statelessness - on every request the
CGI script has to initiate a connection to the database, when the request
is completed the connection is lost. <CODE>Apache::DBI</CODE> was written to remove this limitation. When you use it, you have a
persistent database connection over the process' life. As you understand
this possible only with CGI running under mod_perl enabled server, since
the child process does not quit when the request has been served. So when a
mod_perl script needs to _talk_ to a database, he starts _talking_ right
away, without initiating a database connection first, <CODE>Apache::DBI</CODE> worries to provide a valid connection immediately. Of course the are more
nuances, which will be talked about in the following sections.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Apache_DBI_Initiate_a_persist">Apache::DBI - Initiate a persistent database connection</A></H1></CENTER>
<P>
This module initiates a persistent database connection. It is possible only
with mod_perl.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Introduction">Introduction</A></H2></CENTER>
<P>
When loading the DBI module (do not confuse this with the Apache::DBI
module) it looks if the environment variable GATEWAY_INTERFACE starts with
'CGI-Perl' and if the module Apache::DBI has been loaded. In this case
every connect request will be forwarded to the Apache::DBI module. This
looks if a database handle from a previous connect request is already
stored and if this handle is still valid using the ping method. If these
two conditions are fulfilled it just returns the database handle. If there
is no appropriate database handle or if the ping method fails, a new
connection is established and the handle is stored for later re-use. In
other words when the script is run again from a child that has already (and
is still) connected, the host/username/password is checked against the
cache of open connections, and if one is available, uses that one. There is
no need to delete the disconnect statements from your code. They won't do
anything because the Apache::DBI module overloads the disconnect method
with a NOP (like an empty call).
<P>
You want to use this module if you are opening a <STRONG>few</STRONG> DB connections to the server. <CODE>Apache::DBI</CODE> will make them persistent per child, so if you have 10 children and each
opens 2 different connections you will have in total 20 opened persistent
connections. Thus after initial connect you will save up the connection
time for every connect request from your DBI module. Which is a huge
benefit for the mod_perl apache server with high traffic of users deploying
the relational DB.
<P>
As you understand you must NOT use this module if you are opening a special
connection for each of your users, since each of them will stay persistent
and in a short time the number of connections will be so big that your
machine will scream and die. If you want to use
<CODE>Apache::DBI</CODE> in both situations, as of this moment the only available solution is to run
2 mod_perl servers, one using
<CODE>Apache::DBI</CODE> and one not.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Configuration">Configuration</A></H2></CENTER>
<P>
After installing this module, the configuration is simple - add to the
<CODE>httpd.conf</CODE> the following directive.
<P>
<PRE> PerlModule Apache::DBI
</PRE>
<P>
Note that it is important, to load this module before any other ApacheDBI
module !
<P>
You can skip preloading <CODE>DBI</CODE>, since <CODE>Apache::DBI</CODE> does that. But there is no harm if you leave it in.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Preopening_DBI_connections">Preopening DBI connections</A></H2></CENTER>
<P>
If you want that when you call the script after server restart, the
connection will be already preopened, you should use
<CODE>connect_on_init()</CODE> method in the startup file to preload every connection you are going to
use. For example:
<P>
<PRE> Apache::DBI->connect_on_init
("DBI:mysql:myDB::myserver",
"username",
"passwd",
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
);
</PRE>
<P>
As noted before, it is wise to you this method only if you only want all of
apache to be able to connect to the database server as one user (or few
users).
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Debugging_Apache_DBI">Debugging Apache::DBI</A></H2></CENTER>
<P>
If you are not sure this module is working as advertised, you should enable
the Debug mode in the startup script by:
<P>
<PRE> $Apache::DBI::DEBUG = 1;
</PRE>
<P>
Since now on you will see in the <CODE>error.log</CODE> file when Apache::DBI initializes a connection and when it just returns it
from its cache. Use the following command to see it in the real time (your
<CODE>error.log</CODE> file might be locate at a different path):
<P>
<PRE> tail -f /usr/local/apache/logs/error_log
</PRE>
<P>
I use <CODE>alias</CODE> (in <CODE>tcsh</CODE>) so I will not have to remember the path:
<P>
<PRE> alias err "tail -f /usr/local/apache/logs/error_log"
</PRE>
<P>
Another approach is to add to <CODE>httpd.conf</CODE> (which does the same):
<P>
<PRE> PerlModule Apache::DebugDBI
</PRE>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Problems_and_solutions">Problems and solutions</A></H2></CENTER>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="The_morning_bug">The morning bug</A></H3></CENTER>
<P>
SQL server keeps the connection to the client open for a limited period of
time. So many developers were bitten by so called <STRONG>Morning
bug</STRONG> when every morning the first users to use the site were receiving:
<CODE>No Data Returned</CODE> message, but then everything worked as usual. The error caused by <CODE>Apache::DBI</CODE> returning a handle of the invalid connection (server closed it because of
timeout), and the script was dying on that error. The infamous and well
documented in the man page,
<CODE>ping()</CODE> method was introduced to solve this problem. But seems that people are
still being beaten by this problem. Another solution was found - to rise
the timeout parameter at the SQL server startup. Currently I startup <CODE>mySQL</CODE> server with <CODE>safe_mysql</CODE>
script, so I have updated it to use this option:
<P>
<PRE> nohup $ledir/mysqld [snipped other options] -O wait_timeout=172800
</PRE>
<P>
Where 172800 secs equal to 48 hours. And it works.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="Cannot_find_the_DBI_handler">Cannot find the DBI handler</A></H3></CENTER>
<P>
Q: I cannot find the handler name with which to manipulate my connection;
hence I seem to be unable to do anything to my database.
<P>
A: You did not use <CODE>DBI::connect()</CODE> as with usual DBI usage to get your <CODE>$dbh</CODE> database handler.
Using the <CODE>Apache::DBI</CODE> does not eliminate the need to write a proper <CODE>DBI</CODE> code. As the man page states - you should program as if you did not use <CODE>Apache::DBI</CODE> at all. <CODE>Apache::DBI</CODE> will override and return you a cached connection. And in case of <CODE>disconnect()</CODE> call it will be just ignored.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="Apache_DBI_does_not_work">Apache:DBI does not work</A></H3></CENTER>
<P>
Make sure you have it installed.
<P>
Make sure you configured mod_perl with EVERYTHING=1.
<P>
Use the example script eg/startup.pl (remove the comment from the line
<P>
<PRE> #use Apache::DebugDBI;
</PRE>
<P>
and adapt the connect string. Do not change anything in your scripts, for
using Apache::DBI.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H2><A NAME="Some_useful_code_snippets_to_be_">Some useful code snippets to be used with relational Databases</A></H2></CENTER>
<P>
In this section you will find scripts, modules and code snippets to help
get yourself started to use relational Databases with mod_perl scripts.
Note that I work with <CODE>mysql</CODE> ( <A HREF="http://www.mysql.com">http://www.mysql.com</A> ), so the code
you will find will work out of box with mysql, if you use some other SQL
engine, it might work for you as well, or some changes should be applied.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="Turning_the_SQL_queries_writing_">Turning the SQL queries writing into an short and simple task</A></H3></CENTER>
<P>
Having to write many queries in my CGI scripts, made me to write a stand
alone module that saves me a lot of time in writing and debugging my code.
It also makes my scripts are much smaller and easier to read. I will
present the module here, afterwards examples will follow:
<P>
Notice the <CODE>DESTROY</CODE> block at the end of the module, which makes various cleanups and allows
this module to be used under <CODE>mod_cgi</CODE> as well.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="My_DB_module">My::DB module</A></H3></CENTER>
<P>
(note that you will not find it on CPAN)
<P>
<PRE> package My::DB;
use strict;
use 5.004;
use DBI;
use vars qw(%c);
%c =
(
# DB debug
#db_debug => 1,
db_debug => 0,
db => {
DB_NAME => 'foo',
SERVER => 'localhost',
USER => 'put_username_here',
USER_PASSWD => 'put_passwd_here',
},
);
use Carp qw(croak verbose);
#local $SIG{__WARN__} = \&Carp::cluck;
# untaint the path by explicit setting
local $ENV{PATH} = '/bin:/usr/bin';
#######
sub new {
my $proto = shift;
my $class = ref($proto) || $proto;
my $self = {};
# connect to the DB, Apache::DBI worries to cache the connections
# save into a dbh - Database handle object
$self->{dbh} = DBI->connect("DBI:mysql:$c{db}{DB_NAME}::$c{db}{SERVER}",
$c{db}{USER},
$c{db}{USER_PASSWD},
{
PrintError => 1, # warn() on errors
RaiseError => 0, # don't die on error
AutoCommit => 1, # commit executes immediately
}
)
or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
# we want to die on errors if in debug mode
$self->{dbh}->{RaiseError} = 1 if $c{'db_debug'};
# init the sth - Statement handle object
$self->{sth} = '';
bless ($self, $class);
$self;
} # end of sub new
######################################################################
###################################
### ###
### SQL Functions ###
### ###
###################################
######################################################################
# print debug messages
sub d{
# we want to print in debug mode the trace
print "<DT><B>".join("<BR>", @_)."</B>\n" if $c{'db_debug'};
} # end of sub d
######################################################################
# return a count of matched rows, by conditions
#
# $count = sql_count_matched($table_name,\@conditions);
#
# conditions must be an array so we can path more than one column with
# the same name.
#
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# The sub knows automatically to detect and quote strings
#
##########################
sub sql_count_matched{
my $self = shift;
my $table = shift || '';
my $r_conds = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT COUNT(*) FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= "WHERE ". join " AND ", @where if @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
my ($count) = $self->{sth}->fetchrow_array;
d("Result: $count");
$self->{sth}->finish;
return $count;
} # end of sub sql_count_matched
######################################################################
# return a single (first) matched value or undef, by conditions and
# restrictions
#
# sql_get_matched_value($table_name,$column,\@conditions,\@restrictions);
#
# column is a name of the column
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_value{
my $self = shift;
my $table = shift || '';
my $column = shift || '';
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT $column FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
return $self->{dbh}->selectrow_array($do_sql);
} # end of sub sql_get_matched_value
######################################################################
# return a single row of first matched rows, by conditions and
# restrictions. The row is being inserted into @results_row array
# (value1,value2,...) or empty () if non matched
#
# sql_get_matched_row(\@results_row,$table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_row{
my $self = shift;
my $r_row = shift || {};
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
@{$r_row} = $self->{dbh}->selectrow_array($do_sql);
} # end of sub sql_get_matched_row
######################################################################
# return a ref to hash of single matched row, by conditions
# and restrictions. return undef if nothing matched.
# (column1 => value1, column2 => value2) or empty () if non matched
#
# sql_get_hash_ref($table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_hash_ref{
my $self = shift;
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
return $self->{sth}->fetchrow_hashref;
} # end of sub sql_get_hash_ref
######################################################################
# returns a reference to an array, matched by conditions and
# restrictions, which contains one reference to array per row. If
# there are no rows to return, returns a reference to an empty array:
# [
# [array1],
# ......
# [arrayN],
# ];
#
# $ref = sql_get_matched_rows_ary_ref($table_name,\@columns,\@conditions,\@restrictions);
#
# columns is a list of columns to be returned (username, fname,...)
#
# conditions must be an array so we can path more than one column with
# the same name. @conditions are being cancatenated with AND
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
# results in
# WHERE foo > 15 AND foo < 30
#
# to make an OR logic use (then ANDed )
# @conditions = ( column => ['comp_sign',['value1','value2']],
# foo => ['=',[15,24] ],
# bar => ['=',[16,21] ],
# );
# results in
# WHERE (foo = 15 OR foo = 24) AND (bar = 16 OR bar = 21)
#
# The sub knows automatically to detect and quote strings
#
# restrictions is a list of restrictions like ('order by email')
#
##########################
sub sql_get_matched_rows_ary_ref{
my $self = shift;
my $table = shift || '';
my $r_cols = shift || [];
my $r_conds = shift || [];
my $r_restr = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "SELECT ";
$do_sql .= join ",", @{$r_cols} if @{$r_cols};
$do_sql .= " FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
if (ref $$r_conds[$i+1][1] eq 'ARRAY') {
# multi condition for the same field/comparator to be ORed
push @where, map {"($_)"} join " OR ",
map { join " ",
$r_conds->[$i],
$r_conds->[$i+1][0],
sql_quote(sql_escape($_));
} @{$r_conds->[$i+1][1]};
} else {
# single condition for the same field/comparator
push @where, join " ",
$r_conds->[$i],
$r_conds->[$i+1][0],
sql_quote(sql_escape($r_conds->[$i+1][1]));
}
} # end of for(my $i=0;$i<@{$r_conds};$i=$i+2
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
# restrictions (DONT put commas!)
$do_sql .= " ". join " ", @{$r_restr} if @{$r_restr};
d("SQL: $do_sql");
# do query
return $self->{dbh}->selectall_arrayref($do_sql);
} # end of sub sql_get_matched_rows_ary_ref
######################################################################
# insert a single row into a DB
#
# sql_insert_row($table_name,\%data,$delayed);
#
# data is hash of type (column1 => value1 ,column2 => value2 , )
#
# $delayed: 1 => do delayed insert, 0 or none passed => immediate
#
# * The sub knows automatically to detect and quote strings
#
# * The insert id delayed, so the user will not wait untill the insert
# will be completed, if many select queries are running
#
##########################
sub sql_insert_row{
my $self = shift;
my $table = shift || '';
my $r_data = shift || {};
my $delayed = (shift) ? 'DELAYED' : '';
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "INSERT $delayed INTO $table ";
$do_sql .= "(".join(",",keys %{$r_data}).")";
$do_sql .= " VALUES (";
$do_sql .= join ",", sql_quote(sql_escape( values %{$r_data} ) );
$do_sql .= ")";
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
} # end of sub sql_insert_row
######################################################################
# update rows in a DB by condition
#
# sql_update_rows($table_name,\%data,\@conditions,$delayed);
#
# data is hash of type (column1 => value1 ,column2 => value2 , )
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# $delayed: 1 => do delayed insert, 0 or none passed => immediate
#
# * The sub knows automatically to detect and quote strings
#
#
##########################
sub sql_update_rows{
my $self = shift;
my $table = shift || '';
my $r_data = shift || {};
my $r_conds = shift || [];
my $delayed = (shift) ? 'LOW_PRIORITY' : '';
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "UPDATE $delayed $table SET ";
$do_sql .= join ",",
map { "$_=".join "",sql_quote(sql_escape($$r_data{$_})) } keys %{$r_data};
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where if @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
# my ($count) = $self->{sth}->fetchrow_array;
#
# d("Result: $count");
} # end of sub sql_update_rows
######################################################################
# delete rows from DB by condition
#
# sql_delete_rows($table_name,\@conditions);
#
# conditions must be an array so we can path more than one column with
# the same name.
# @conditions = ( column => ['comp_sign','value'],
# foo => ['>',15],
# foo => ['<',30],
# );
#
# * The sub knows automatically to detect and quote strings
#
#
##########################
sub sql_delete_rows{
my $self = shift;
my $table = shift || '';
my $r_conds = shift || [];
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
# build the query
my $do_sql = "DELETE FROM $table ";
my @where = ();
for(my $i=0;$i<@{$r_conds};$i=$i+2) {
push @where, join " ",
$$r_conds[$i],
$$r_conds[$i+1][0],
sql_quote(sql_escape($$r_conds[$i+1][1]));
}
# Must be very carefull with deletes, imagine somehow @where is
# not getting set, "DELETE FROM NAME" deletes the contents of the table
warn("Attempt to delete a whole table $table from DB\n!!!"),return unless @where;
# Add the where clause if we have one
$do_sql .= " WHERE ". join " AND ", @where;
d("SQL: $do_sql");
# do query
$self->{sth} = $self->{dbh}->prepare($do_sql);
$self->{sth}->execute();
} # end of sub sql_delete_rows
######################################################################
# executes the passed query and returns a reference to an array which
# contains one reference per row. If there are no rows to return,
# returns a reference to an empty array.
#
# $r_array = sql_execute_and_get_r_array($query);
#
#
##########################
sub sql_execute_and_get_r_array{
my $self = shift;
my $do_sql = shift || '';
# we want to print in debug mode the trace
d( "[".(caller(2))[3]." - ".(caller(1))[3]." - ". (caller(0))[3]."]");
d("SQL: $do_sql");
$self->{dbh}->selectall_arrayref($do_sql);
} # end of sub sql_execute_and_get_r_array
#
#
# return current date formatted for a DATE field type
# YYYYMMDD
#
############
sub sql_date{
my $self = shift;
my ($sec,$min,$hour,$mday,$mon,$year) = localtime();
$year = ($year>70) ? sprintf "19%0.2d",$year : sprintf "20%0.2d",$year;
return sprintf "%0.4d%0.2d%0.2d",$year,++$mon,$mday;
} # end of sub sql_date
#
#
# return current date formatted for a DATE field type
# YYYYMMDDHHMMSS
#
############
sub sql_datetime{
my $self = shift;
my ($sec,$min,$hour,$mday,$mon,$year) = localtime();
$year = ($year>70) ? sprintf "19%0.2d",$year : sprintf "20%0.2d",$year;
return sprintf "%0.4d%0.2d%0.2d%0.2d%0.2d%0.2d",$year,++$mon,$mday,$hour,$min,$sec;
} # end of sub sql_datetime
# Quote the list of parameters , alldigits parameters are unquoted (int)
# print sql_quote("one",2,"three"); => 'one' 2 'three'
#############
sub sql_quote{ map{ /^(\d+|NULL)$/ ? $_ : "'$_'" } @_ }
# Escape the list of parameters (all unsafe chars like ",' are escaped )
# must make a copy of @_ since we might try to change the passed
# (Modification of a read-only value attempted)
##############
sub sql_escape{ my @a = @_; map { s/(['])/\\$1/g;$_} @a }
# DESTROY makes all kinds of cleanups if the fuctions were interuppted
# before their completion and haven't had a chance to make a clean up.
###########
sub DESTROY{
my $self = shift;
$self->{sth}->finish if defined $self->{sth} and $self->{sth};
$self->{dbh}->disconnect if defined $self->{dbh} and $self->{dbh};
} # end of sub DESTROY
# Don't remove
1;
</PRE>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H3><A NAME="My_DB_Module_s_Usage_Examples">My::DB Module's Usage Examples</A></H3></CENTER>
<P>
In your code that wants to use <CODE>My::DB</CODE>, you have to create a
<CODE>My::DB</CODE> object first:
<P>
<PRE> use vars qw($db_obj);
my $db_obj = new My::DB or croak "Can't initialize My::DB object: $!\n";
</PRE>
<P>
From this moment, you can use any <CODE>My::DB</CODE>'s methods. I will start from a very simple query - I want to know where
the users are and produce statistics. <CODE>tracker</CODE> is the name of the table.
<P>
<PRE> # fetch the statistics of where users are
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
("tracker",
[qw(where_user_are)],
);
my %stats = ();
my $total = 0;
foreach my $r_row (@$r_ary){
$stats{$r_row->[0]}++;
$total++;
}
</PRE>
<P>
Now let's count how many users do we have (in <CODE>users</CODE> table):
<P>
<PRE> my $count = $db_obj->sql_count_matched("users");
</PRE>
<P>
Check whether user exists:
<P>
<PRE> my $username = 'stas';
my $exists = $db_obj->sql_count_matched
("users",
[username => ["=",$username]]
);
</PRE>
<P>
Check whether user online and get time since when she is online (<CODE>since</CODE> a column in the <CODE>tracker</CODE> table telling since when user is online):
<P>
<PRE> my @row = ();
$db_obj->sql_get_matched_row
(\@row,
"tracker",
['UNIX_TIMESTAMP(since)'],
[username => ["=",$username]]
);
if (@row) {
my $idle = int( (time() - $row[0]) / 60);
return "Current status: Is Online and idle for $idle minutes.";
}
</PRE>
<P>
A complex query. I do join of 2 tables, and want to get a reference to
array, which will store a slice of the matched query (<CODE>LIMIT
$offset,$hits</CODE>), sorted by <CODE>username</CODE> and each row in array_ref to include the fields from the <CODE>users</CODE> table, but only those listed in
<CODE>@verbose_cols</CODE>. Then we print it out.
<P>
<PRE> my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
(
"tracker STRAIGHT_JOIN users",
[map {"users.$_"} @verbose_cols],
[],
["WHERE tracker.username=users.username",
"ORDER BY users.username",
"LIMIT $offset,$hits"],
);
foreach my $r_row (@$r_ary){
print ...
}
</PRE>
<P>
Another complex query. User checks checkboxes to be queried by, selects
from lists and types in match strings, we process input and build the <CODE>@where</CODE> array. Then we want to get the number of matches and the matched rows as
well.
<P>
<PRE> my @where = ();
# process chekoxes - we turn them into REGEXP
foreach (keys %search_keys) {
next unless defined $q->param($_) and $q->param($_);
my $regexp = "[".join("",$q->param($_))."]";
push @where, ($_ => ['REGEXP',$regexp]);
}
# Now add all the single answer , selected => exact macth
push @where,(country => ['=',$q->param('country')]) if $q->param('country');
# Now add all the typed params
foreach (qw(city state)) {
push @where,($_ => ['LIKE',$q->param($_)]) if $q->param($_);
}
# Do the count all matched query
my $total_matched_users = $db_obj->sql_count_matched
(
"users",
\@where,
);
# Now process the orderby
my $orderby = $q->param('orderby') || 'username';
# Do the query and fetch the data
my $r_ary = $db_obj->sql_get_matched_rows_ary_ref
(
"users",
\@display_columns,
\@where,
["ORDER BY $orderby",
"LIMIT $offset,$hits"],
);
</PRE>
<P>
<CODE>sql_get_matched_rows_ary_ref</CODE> knows to handle both <CODE>OR</CODE>ed and
<CODE>AND</CODE>ed params. This example shows how to use <CODE>OR</CODE> on parameters:
<P>
This snippet is an implementation of the watchdog. Users register usernames
of the people they want to know when these are going online, so we have to
make 2 queries - one to get a list of these usernames, second to query
whether any of these users is online. In the second query we use <CODE>OR</CODE> keyword.
<P>
<PRE> # check who we are looking for
$r_ary = $db_obj->sql_get_matched_rows_ary_ref
("watchdog",
[qw(watched)],
[username => ['=',$username)],
],
);
# put them into an array
my @watched = map {$_->[0]} @{$r_ary};
my %matched = ();
# Do user has some registered usernames?
if (@watched) {
# try to bring all the users who match (exactly) the usernames - put
# it into array and compare with a hash!
$r_ary = $db_obj->sql_get_matched_rows_ary_ref
("tracker",
[qw(username)],
[username => ['=',\@watched],
]
);
map {$matched{$_->[0]} = 1} @{$r_ary};
}
# Now %matched includes the usernames of the users who are being
# watched by $username and currently are online.
</PRE>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
[ <A HREF="warnings.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="dbm.html">Next</A> ]
<CENTER><TABLE CELLSPACING=2 CELLPADDING=2 WIDTH="100%" >
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER COLSPAN="3">
<HR>
</TD>
</TR>
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER>
<B>
<FONT SIZE=-1>
Written by <A HREF="help.html#This_document_s_Author">Stas Bekman</A>.
<BR>Last Modified at 05/08/1999
</FONT>
</B>
</TD>
<TD>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl2.jpg" ALT="Mod Perl Icon" BORDER=0 HEIGHT=59 WIDTH=150></A>
</TD>
<TD>
<FONT SIZE=-2>
Use of the Camel for Perl is <BR>
a trademark of <A HREF="http://www.ora.com">O'Reilly & Associates</A>,<BR>
and is used by permission.
</FONT>
</TD>
</TR>
</TABLE></CENTER>
</BODY>
</HTML>
1.1 modperl-site/guide/dbm.html
Index: dbm.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>mod_perl guide: mod_perl and dbm files</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; AIX 4.1) [Netscape]">
<META NAME="Author" CONTENT="Bekman Stas">
<META NAME="Description" CONTENT="All Apache/Perl related information: Hints,
Guidelines, Scenarios and Troubleshottings">
<META NAME="keywords" CONTENT="mod_perl modperl perl apache cgi webserver speed
fast guide mod_perl apache guide help info faq mod_perl installation cgi
troubleshooting help no sex speedup free open source OSS mod_perl apache guide">
</HEAD>
<LINK REL=STYLESHEET TYPE="text/css"
HREF="style.css" TITLE="refstyle">
<style type="text/css">
<!--
@import url(style.css);
-->
</style>
<BODY TEXT="#000000" BGCOLOR="#E0FFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
<A NAME="toc"></A>
<H1 ALIGN=CENTER>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=LEFT></A>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=RIGHT></A>
mod_perl and dbm files</H1>
<HR WIDTH="100%">
[ <A HREF="databases.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="multiuser.html">Next</A> ]<HR><!-- INDEX BEGIN -->
<P><B><FONT SIZE=-1>Table of Contents:</FONT></B></P>
<UL>
<LI><A HREF="#Where_and_Why_to_use_dbm_files">Where and Why to use dbm files</A>
<LI><A HREF="#mod_perl_and_dbm">mod_perl and dbm</A>
<LI><A HREF="#Locking_dbm_handlers">Locking dbm handlers</A>
<LI><A HREF="#Tie_DB_Lock">Tie::DB_Lock</A>
<LI><A HREF="#Code_snippets">Code snippets</A>
</UL>
<!-- INDEX END -->
<HR>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<P>
<CENTER><H1><A NAME="Where_and_Why_to_use_dbm_files">Where and Why to use dbm files</A></H1></CENTER>
<P>
dbm files are the first implementations of the databases, which originated
on Unix systems, and currently being used in many Unix applications where
simple key-value pairs should be stored and manipulated. As of this writing
Berkeley DB is the most powerful dbm implementation. If you need a light
database, with easy API to work with this is a solution that should be
considered as a first one. Of course only if you are sure the DB you are
going to use will stay small, I would say under 5000-10000 records, but it
depends on your hardware, which can rise and lower the numbers above. It is
a much better solution over the flat file databases which become pretty
slow on insert, update and delete operations when the number of records
grows beyond 1000. The situation is even worse when we need to run sort on
this kind of DB.
<P>
dbm files are being manipulated much faster than their flat file brothers,
since almost never the whole DB is being read into a memory and because of
smart storage technique. You can use a <CODE>HASH</CODE>
algorithm which allows a <CODE>0(1)</CODE> complexity of search and update, fast insert and delete, but slow sort,
since you have to do it yourself. <CODE>BTREE</CODE> allows arbitrary key/value pairs to be stored in a sorted, balanced binary
tree, which allows us to get a sorted sequence of data pairs in <CODE>0(1)</CODE>, but much slower insert, update, delete operations. <CODE>RECNO</CODE> algorithm is more complicated one, and enables for both fixed-length and
variable-length flat text files to be manipulated using the same key/value
pair interface as in <CODE>HASH</CODE> and
<CODE>BTREE</CODE>. In this case the key will consist of a record (line) number. Most chances
you will want to use the <CODE>HASH</CODE> format, but your choice is very dependent on a kind of your application.
<P>
<STRONG>dbm</STRONG> databases are not limited for key and value pairs storages, but can store
more complicated structures with help of <CODE>MLDBM</CODE>
module. Which can dump and restore the whole symbol table of your script,
including arrays, hashes and other complicated data <CODE>HASH</CODE>
structures.
<P>
Another important thing to say, is that you cannot convert a dbm file from
one storing algorithm to another, by simply tying it using a wanted format.
The only way is to dump it into a flat file and then restore it using a new
format. You can use a script like:
<P>
<PRE> #!/usr/bin/perl -w
#
# This script gets as a parameter a Berkeley DB file(s) which is stored
# with DB_BTREE algorithm, and will backup it with .bak and create
# instead the db with the same records but stored with DB_HASH
# algorithm
#
# Usage: btree2hash.pl filename(s)
use strict;
use DB_File;
use File::Copy;
# Do checks
die "Usage: btree2hash.pl filename(s))\n" unless @ARGV;
foreach my $filename (@ARGV) {
die "Can't find $filename: $!\n" unless -e $filename and -r $filename;
# First backup the filename
move("$filename","$filename.btree")
or die "can't move $filename $filename.btree:$!\n";
my %hash;
my %btree;
# tie both dbs (db_hash is a fresh one!)
tie %btree , 'DB_File',"$filename.btree", O_RDWR|O_CREAT,
0660, $DB_BTREE or die "Can't tie %btree";
tie %hash , 'DB_File',"$filename" , O_RDWR|O_CREAT,
0660, $DB_HASH or die "Can't tie %hash";
# copy DB
%hash = %btree;
# untie
untie %btree ;
untie %hash ;
}
</PRE>
<P>
Note that some dbm implementations come with other conversion utilities as
well.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="mod_perl_and_dbm">mod_perl and dbm</A></H1></CENTER>
<P>
Where mod_perl enters into a picture? If you are using a read only dbm file
you can have it work faster if you keep it open (tied) all the time, so
when your CGI script wants to access the database it is already tied and
ready to be used. It will work as well with your dynamic dbm databases as
well but you need to use locking to avoid data corruptions. Of course this
feature can make a huge speedup to your CGIs, but you should be very
careful. What should be taken into account is a db locking, handling
possible <CODE>die()</CODE> cases and child quits. A stale lock can deactivate your whole site, if your
locking mechanism cannot handle dropped locks. You can enter a deadlock
situations if 2 processes are trying to acquire locks on 2 databases, but
get stuck because each has got hands on one of the 2 databases, and to
release it, each process needs the second one, which will never be freed,
because that is the condition for the first one to be released (possible
only if processes do not all ask for their DB files in the same order). If
you modify the DB you should be very careful to flush and synchronize it,
especially when your CGI unexpectedly dies. In general your application
should be tested very thoroughly before you put it into production to
handle important data.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Locking_dbm_handlers">Locking dbm handlers</A></H1></CENTER>
<P>
Let's have a lock status as a global variable, so it will persist from
request to request. If we are requesting a lock - READ (shared) or WRITE
(exclusive), the current lock status is being obtained first.
<P>
If we get a READ lock request, it is granted as soon as file becomes or is
locked or already locked for READ. Lock status is READ now.
<P>
If we get a WRITE lock request, it is granted as soon as file becomes or is
unlocked. Lock status is WRITE now.
<P>
What happens to the WRITE lock request, is the most important. If the DB is
being READ locked, request that request to write will poll until there will
be no reading or writing process left. Lots of processes can successfully
read the file, since they do not block each other from doing so. This means
that a process that wants to write to the file (first obtaining an
exclusive lock) never gets a chance to squeeze in. The following diagram
represents a possible scenario where everybody read but no one can write:
<P>
<PRE> [-p1-] [--p1--]
[--p2--]
[---------p3---------]
[------p4-----]
[--p5--] [----p5----]
</PRE>
<P>
So you get a starving process, which most certainly will timeout the
request, and the DB will be not updated.
<P>
So you have another reason not to cache the dbm handle with dynamic dbm
files. But it will work perfect with the static DBM files without a need to
lock files at all. Ken Williams solved the above problem in his <CODE>Tie::DB_Lock</CODE> module, and I will present it in the next section.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Tie_DB_Lock">Tie::DB_Lock</A></H1></CENTER>
<P>
<CODE>Tie::DB_Lock</CODE> - ties hashes to databases using shared and exclusive locks. A module by
Ken Williams. which solves the problem raised in the previous section.
<P>
The main difference from what I have described before is that
<CODE>Tie::DB_Lock</CODE> copies a dbm file on read so that reader processes do not have to keep the
file locked while they read it, and writers can still access it while
others are reading. It works best when you have lots of long-duration
reading, and a few short bursts of writing.
<P>
The drawback of this module is a heavy IO performed when every reader makes
a fresh copy of the DB. With big dbm files this can be quite a disadvantage
and slowdown. An improvement that can cut a number of files that are being
copied, would be to have only one copy of the dbm image that will be shared
by all the reader processes. So it would put the responsibility of copying
the read-only file on the writer, not the reader. It would take some care
to make sure it does not disturb readers when putting a new read-only copy
into place.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Code_snippets">Code snippets</A></H1></CENTER>
<P>
I have discussed what can be achieved with mod_perl and dbm files, the cons
and pros. Now it is a time to show some code. I wrote a simple wrapper for
a <CODE>DB_File</CODE> module, and extended it to handle locking, and proper exits. Note that this
code still demands some testing, so be careful if you use it on your
production machine as is.
<P>
So the <CODE>DB_File::Wrap</CODE> (note that you will not find it on CPAN):
<P>
<PRE> package DB_File::Wrap;
require 5.004;
use strict;
BEGIN {
# RCS/CVS complient: must be all one line, for MakeMaker
$DB_File::Wrap::VERSION = do { my @r = (q$Revision: 1.1 $ =~ /\d+/g); sprintf "%d."."%02d" x $#r, @r };
}
use DB_File;
use Fcntl qw(:flock O_RDWR O_CREAT);
use Carp qw(croak carp verbose);
use IO::File;
use vars qw($debug);
#$debug = 1;
$debug = 0;
# my $db = DB_File::Wrap \%hash, $filename, [lockmode];
# from now one we can work with both %hash (tie API) and $db (direct API)
#########
sub new{
my $class = shift;
my $hr_hash = shift;
my $file = shift;
my $lock_mode = shift || '';
my $db_type = shift || 'HASH';
my $self;
$self = bless {
db_type => 'DB_File',
flags => O_RDWR|O_CREAT,
mode => 0660,
hash => $hr_hash,
how => $DB_HASH,
}, $class ;
# by default we tie with HASH alg and if requested with BTREE
$self->{'how'} = ($db_type eq 'BTREE') ? $DB_BTREE : $DB_HASH;
# tie the object
$self->{'db_obj'} = tie %{$hr_hash},
$self->{'db_type'},$file, $self->{'flags'},$self->{'mode'}, $self->{'how'}
or croak "Can't tie $file:$!\n"; ;
my $fd = $self->{'db_obj'}->fd;
croak "Can't get fd :$!" unless defined $fd and $fd;
$self->{'fh'}= new IO::File "+<&=$fd" or croak "[".__PACKAGE__."] Can't dup: $!";
# set the lock status to unlocked
$self->{'lock'} = 0;
# do the lock here if requested
$self->lock($lock_mode) if $lock_mode;
return $self;
} # end of sub new
# lock the fd either exclusive or shared lock (write/read)
# default is read (shared)
###########
sub lock{
my $self = shift;
my $lock_mode = shift || 'read';
# lock codes:
# 0 == not locked
# 1 == read locked
# 2 == write locked
if ($lock_mode eq 'write') {
# Get the exclusive write lock
unless (flock ($self->{'fh'}, LOCK_EX | LOCK_NB)) {
unless (flock ($self->{'fh'}, LOCK_EX)) { croak "exclusive flock: $!" }
}
# save the status of lock
$self->{'lock'} = 2;
} elsif ($lock_mode eq 'read'){
# Get the shared read lock
unless (flock ($self->{'fh'}, LOCK_SH | LOCK_NB)) {
unless (flock ($self->{'fh'}, LOCK_SH)) { croak "shared flock: $!" }
}
# save the status of lock
$self->{'lock'} = 1;
} else {
# incorrect mode
carp "Can't lock. Unknown mode: $lock_mode\n";
}
} # end of sub lock
# unlock
###########
sub unlock{
my $self = shift;
$self->{'db_obj'}->sync() if defined $self->{'db_obj'}; # flush
flock($self->{'fh'}, LOCK_UN);
$self->{'lock'} = 0;
}
# untie the hash
# and close all the handlers
# if wasn't unlocked, end() will unlock as well
###########
sub end{
my $self = shift;
# unlock if stilllocked
$self->unlock() if $self->{'lock'};
delete $self->{'db_obj'} if $self->{'db_obj'};
untie %{$self->{'hr_hash'}} if $self->{'hr_hash'};
$self->{'fh'}->close if $self->{'fh'};
}
# DESTROY makes all kinds of cleanups if the fuctions were interuppted
# before their completion and haven't had a chance to make a clean up.
###########
sub DESTROY{
my $self = shift;
# just to be sure that we properly closed everything
$self->end();
print "Destroying ".__PACKAGE__."\n" if $debug;
undef $self if $self;
}
####
END {
print "Calling the END from ".__PACKAGE__."\n" if $debug;
}
1;
</PRE>
<P>
And you use it :
<P>
<PRE> use DB_File::Wrap ();
</PRE>
<P>
A simple tie, READ lock and untie
<P>
<PRE> my %mydb = ();
my $db = new DB_File::Wrap \%mydb, $dbfile, 'read';
print $mydb{'stas'} if exists $mydb{'stas'};
# sync and untie
$db->end();
</PRE>
<P>
You can even skip the <CODE>end()</CODE> call, if leave the scope <CODE>$db</CODE> defined in:
<P>
<PRE> sub user_exists{
my $user = shift;
my $result = 0;
my %mydb = ();
my $db = new DB_File::Wrap \%mydb, $dbfile, 'read';
# if we match the username return 1
$result = 1 if $mydb{$user};
$result;
} # end of sub user_exists
</PRE>
<P>
Perform both, read and write operations:
<P>
<PRE> my %mydb = ();
my $db = new DB_File::Wrap \%mydb, $dbfile;
print $mydb{'stas'} if exists $mydb{'stas'};
# lock the db, we gonna change it!
$db->lock('write');
$mydb{'stas'} = 1;
# unlock the db for write
# sync and untie
$db->end();
</PRE>
<P>
If your CGI was interrupted in the middle, <CODE>DESTROY</CODE> block will worry to unlock the dbm file and flush the changes. Note that I
have got db corruptions even with this code on huge dbm files 10000+
records, so be careful when you use it. I thought that I have covered all
the possible failures but seems that not all of them. At the end I have
moved everything to work with <STRONG>mysql</STRONG>. So if you figure out where the problem is you are very welcome to tell me
about it.
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
[ <A HREF="databases.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="multiuser.html">Next</A> ]
<CENTER><TABLE CELLSPACING=2 CELLPADDING=2 WIDTH="100%" >
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER COLSPAN="3">
<HR>
</TD>
</TR>
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER>
<B>
<FONT SIZE=-1>
Written by <A HREF="help.html#This_document_s_Author">Stas Bekman</A>.
<BR>Last Modified at 05/08/1999
</FONT>
</B>
</TD>
<TD>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl2.jpg" ALT="Mod Perl Icon" BORDER=0 HEIGHT=59 WIDTH=150></A>
</TD>
<TD>
<FONT SIZE=-2>
Use of the Camel for Perl is <BR>
a trademark of <A HREF="http://www.ora.com">O'Reilly & Associates</A>,<BR>
and is used by permission.
</FONT>
</TD>
</TR>
</TABLE></CENTER>
</BODY>
</HTML>
1.1 modperl-site/guide/strategy.html
Index: strategy.html
===================================================================
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>mod_perl guide: Picking the Right Strategy</TITLE>
<META NAME="GENERATOR" CONTENT="Mozilla/3.04Gold (X11; I; AIX 4.1) [Netscape]">
<META NAME="Author" CONTENT="Bekman Stas">
<META NAME="Description" CONTENT="All Apache/Perl related information: Hints,
Guidelines, Scenarios and Troubleshottings">
<META NAME="keywords" CONTENT="mod_perl modperl perl apache cgi webserver speed
fast guide mod_perl apache guide help info faq mod_perl installation cgi
troubleshooting help no sex speedup free open source OSS mod_perl apache guide">
</HEAD>
<LINK REL=STYLESHEET TYPE="text/css"
HREF="style.css" TITLE="refstyle">
<style type="text/css">
<!--
@import url(style.css);
-->
</style>
<BODY TEXT="#000000" BGCOLOR="#E0FFFF" LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">
<A NAME="toc"></A>
<H1 ALIGN=CENTER>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=LEFT></A>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl.gif" ALT="Mod Perl Icon" BORDER=0 HEIGHT=30 WIDTH=90 ALIGN=RIGHT></A>
Picking the Right Strategy</H1>
<HR WIDTH="100%">
[ <A HREF="start.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="scenario.html">Next</A> ]<HR><!-- INDEX BEGIN -->
<P><B><FONT SIZE=-1>Table of Contents:</FONT></B></P>
<UL>
<LI><A HREF="#Do_it_like_me_">Do it like me?!</A>
<LI><A HREF="#mod_perl_Deploying_Schemas_Overv">mod_perl Deploying Schemas Overview</A>
<LI><A HREF="#Stand_alone_mod_perl_Enabled_Apa">Stand alone mod_perl Enabled Apache Server</A>
<LI><A HREF="#One_Plain_and_One_mod_perl_enabl">One Plain and One mod_perl enabled Apache Servers</A>
<LI><A HREF="#One_light_non_Apache_and_One_mod">One light non-Apache and One mod_perl enabled Apache Servers</A>
<LI><A HREF="#Adding_a_Proxy_Server_in_http_Ac">Adding a Proxy Server in http Accelerator Mode</A>
<LI><A HREF="#squid_server">squid server</A>
<LI><A HREF="#apache_s_mod_proxy">apache's mod_proxy</A>
</UL>
<!-- INDEX END -->
<HR>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<P>
<CENTER><H1><A NAME="Do_it_like_me_">Do it like me?!</A></H1></CENTER>
<P>
There is no such a thing as a <STRONG>RIGHT</STRONG> strategy in web server business, though there are the wrong ones. Never
believe a person, saying: <EM>"Do it this way, this is the best!"</EM>. As the old saying says: <EM>"Trust but verify"</EM>. There are too many technologies out there to pick from, and it might take
an enormous investment of time and money to try to validate each one,
before deciding what is your favorite one. Keeping this idea in mind, I
will try to present different combinations of other technologies and
mod_perl or just stand alone mod_perl. How these things work together, what
are the good and what are the bad properties of each one, how hard to
install and maintain them, when a specific approach should be picked and
when it should be avoided.
<P>
To make myself clear, I would not talk about any technologies but the ones
useful in tandem with mod_perl.
<P>
Please let me stress it again: <STRONG>DO NOT</STRONG> blindly copy someone's setup and hope for a good. Choose what is best for
your situation.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="mod_perl_Deploying_Schemas_Overv">mod_perl Deploying Schemas Overview</A></H1></CENTER>
<P>
There are several different ways to build, configure and deploy your
mod_perl enabled site. Some of the ways are:
<OL>
<P><LI>
<P>
1 binary and 1 config file (one big binary for mod_perl)
<P><LI>
<P>
2 binaries and 2 config files (one big binary for mod_perl and one small
for static object like images)
<P><LI>
<P>
1 DSO style compile and 2 config files (Dynamic linking lets you compile
once and have a big and a small binary in memory BUT you have to deal with
a freshly made solution that has week documentation and is still subject to
change and is rather more complex)
<P><LI>
<P>
Any of the 3 approaches above plus proxy server in the http accelerator
mode.
</OL>
<P>
If you are a newbie, I would recommend you to start with the first option
and work on getting your feet wet with apache and mod_perl, then decide
whether to move to a second one for production work or move up to the more
state-of-the-art-yet-suspiciously-new DSO system (third) or/and proven to
be good forth option from above.
<P>
Option 1 will kill you in production if you load out a lot of static data
with a 2-12 MB webserver process. On the other hand while testing you will
have no other server interaction to mask or add to your errors.
<P>
Option 2 means keeping two compiles but lets you seriously tune the two
binaries to their exact purpose. On the other hand you have to deal with
proxying or fancy site design to keep the two servers in step. Choose
serving on multiple ports, multiple IPs, etc...
<P>
Option 3 (DSO) -- as mentioned above -- means playing with the bleeding
edge. Also the <CODE>mod_so</CODE> adds size and complexity to your binaries. The benefit being that modules
can be added and removed with out recompiling and even shared amongst
multiple servers. Again, it is bleeding edge and still sort of platform
wonky so YMMV. See
<A HREF="././scenario.html#mod_perl_server_as_DSO">mod_perl server as DSO</A>.
<P>
Option 4 (proxy in http accelerator mode) - should make you lots of good
things, once correctly configured and tuned.
<P>
The rest of this chapter discuss the pros and the cons of each of the
presented schemas. <A HREF="././scenario.html#">Real World Scenarios of Implementing Various Strategies</A> describes the implementation techniques of described schemas.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Stand_alone_mod_perl_Enabled_Apa">Stand alone mod_perl Enabled Apache Server</A></H1></CENTER>
<P>
The first approach is to implement a straightforward mod_perl server. Just
take your plain apache server and add to it the mod_perl, like you add any
other apache module. You continue to run it at the port it was running
before. You probably want to try this before you proceed to more
sophisticated and complex techniques.
<P>
The Advantages:
<UL>
<P><LI>
<P>
Simplicity of installation. You just follow the installation instructions,
configure it, restart the server and you have finished.
<P><LI>
<P>
You do not have to think about using additional ports as we will see later.
<P>
You get a very fast server, you see an enormous speedup from the first
moment you have started to use it.
</UL>
<P>
The Disadvantages:
<UL>
<P><LI>
<P>
A mod_perl process' size is huge compared to the plain apache you have got
used to (starting from 4Mb at the startup and growing to 10Mb and more,
depending on what and how do you use it). You probably have a few tens of
children processes. The memory adds up. Your memory demands are growing in
an order of magnitude. But this is something you will have to live with, if
you are in the mod_perl boat. Memory is cheap nowadays and the performance
boost mod_perl gives to your services worths every 100Mb of RAM you add.
<P>
While you are calm to have these monsters to serve you scripts with the
monsters' speed, you should be very worried about them serving your static
objects, like images and html files. It means that if you will need more
processes to serve these - more memory, the real overhead depends on static
objects request rate. Remember that if your mod_perl code produces HTML
code, which includes images each one will turn into a static object request
(or only one request if you have the
<CODE>KeepAlive</CODE> directive turned <CODE>ON</CODE>). Having another plain webserver to serve the static objects solves this
not pleasant obstacle. Having a proxy server as a front end, caching the
static objects and freeing the mod_perl processes from this burden is
another solution. We will discuss both below.
<P><LI>
<P>
Another drawback that you cannot afford is having the huge process to wait
till the client with slow connection will receive the whole response. While
it might take a few milliseconds for your script to complete the request,
there is a chance it will be still busy for another 5 seconds or even
minutes if the request has come from the slow connection client. As in the
previous drawback, proxy comes to solve this problem. More on proxies
later.
<P>
Proxying dynamic content is not going to help much if all the clients are
on a fast local net. For example if you are an administrator of the
Intranet. On opposite, it can make things worse. But still remember that
some of your Intranet users might work from home through the slow modem
links.
</UL>
<P>
If you are new to mod_perl, this is probably the best way to get yourself
started.
<P>
Of course if your site is serving only the mod_perl scripts, this might be
the most perfect choice for you!
<P>
For implementation notes see :
<A HREF="././scenario.html#One_Plain_and_One_mod_perl_enabl">One Plain and One mod_perl enabled Apache Servers</A>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="One_Plain_and_One_mod_perl_enabl">One Plain and One mod_perl enabled Apache Servers</A></H1></CENTER>
<P>
As I have mentioned before, when running your scripts under mod_perl, you
will notice that the httpd processes consume a huge amount of memory, from
5M to 25M and more. That is the price you pay for the enormous speed
improvements under mod_perl.
<P>
It is an overkill to serve static objects like images and html documents
with these large processes. The best approach is to run two servers: a very
light plain apache server to serve static objects and a heavier mod_perl
enabled apache server to take care of requests to dynamic (generated)
objects (aka CGI).
<P>
Since now on, I will refer to these two servers as <STRONG>httpd_docs</STRONG>
(vanilla apache) and <STRONG>httpd_perl</STRONG> (mod_perl enabled apache).
<P>
The Advantages:
<UL>
<P><LI>
<P>
Heavy mod_perl processes serving only dynamic requests. So we need much
less servers - memory requirements are drastically going down. While this
is always correct for static HTML object, you might still have a problem
with static object requests from the generated HTML. You must use a fully
qualified URL and not a relative one! When you generate an HTML which has
relative references to images or other static object - client's browser
sets its base url to point to
<CODE>httpd_perl</CODE> server's. And all the image requests will go to the heavy <CODE>httpd_perl</CODE>. Be aware! You solve this by either using a full URLs to the <CODE>httpd_docs</CODE> server (the fastest) or rewriting the requests back to <CODE>httpd_docs</CODE> (much slower). This is not a problem if you hide the internal
implementations, so client sees only one server running on port <CODE>80</CODE>. (<A HREF="././config.html#Publishing_port_numbers_differen">Publishing port numbers different from 80</A>)
<P><LI>
<P>
<CODE>MaxClients</CODE>, <CODE>MaxRequestsPerChild</CODE> and related parameters now can be optimally tuned for both <CODE>httpd_docs</CODE> and <CODE>httpd_perl</CODE> servers, something we could not do before. This allows us to fine tune the
memory usage and get a better server performance.
<P>
Now we can spawn many light weighted <CODE>httpd_docs</CODE> servers and just a few heavy <CODE>httpd_perl</CODE> servers.
</UL>
<P>
The Disadvantages:
<UL>
<P><LI>
<P>
An administration overhead.
<UL>
<P><LI>
<P>
A need for 2 different configuration files, 2 log files sets and etc. We
need a special directories layout to manage these. While some directories
can be shared between the two servers, like <CODE>include</CODE>
directory, storing the apache include files (assuming that both are built
from the same source distribution), most of them should be separated and
configuration files updated to reflect the changes.
<P><LI>
<P>
A need for 2 sets of controlling scripts (startup/shutdown)
<P><LI>
<P>
If you are processing log files, now you probably will have to merge the 2
separate log files.
</UL>
<P><LI>
<P>
We still have the problem of mod_perl process spending its precious time to
server slow clients, when the request was processed long time ago, exactly
like in 1 server approach. Proxy solves this, and will be covered in the
next sections.
<P>
Exactly as with only one server approach, this is not a disadvantage if you
are on the fast local Intranet. Most chances that you do not want a
buffering server on your way.
</UL>
<P>
For implementation notes see :
<A HREF="././scenario.html#One_Plain_and_One_mod_perl_enabl">One Plain and One mod_perl enabled Apache Servers</A>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="One_light_non_Apache_and_One_mod">One light non-Apache and One mod_perl enabled Apache Servers</A></H1></CENTER>
<P>
If the only functionality needed from the the light server is static
objects serving, you can get away with non-apache servers, whose memory
footprint is even smaller. <CODE>thttpd</CODE> has been reported to be about 5 times faster then apache (especially under
a heavy load), since it is very simple and uses almost no memory (260k) and
does not spawn child processes.
<P>
The Advantages:
<UL>
<P><LI>
<P>
All the advantages of the 2 servers scenario.
<P><LI>
<P>
More memory saving. Apache is about 4 times bigger then <STRONG>thttpd</STRONG>, it if you spawn 30 children you use about 30M of memory, while <STRONG>thttpd</STRONG>
uses only 260k - 100 times less! You could use the saved 30M to run more
mod_perl servers.
<P><LI>
<P>
Reported to be about 5 times faster then plain apache serving static
objects.
</UL>
<P>
The Disadvantages:
<UL>
<P><LI>
<P>
Not that I know of :)
</UL>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="Adding_a_Proxy_Server_in_http_Ac">Adding a Proxy Server in http Accelerator Mode</A></H1></CENTER>
<P>
At the beginning there were 2 servers: one - plain apache server, which was
_very_ light, and configured to serve static objects, the other - mod_perl
enabled, which was _very_ heavy and aimed to serve mod_perl scripts. We
named them: <CODE>httpd_docs</CODE> and <CODE>httpd_perl</CODE>
appropriately. The two servers coexisted at the same <CODE>IP(DNS)</CODE>
by listening to different ports: 80 - for httpd_docs (e.g. <A
HREF="http://www.nowhere.com/images/test.gif">http://www.nowhere.com/images/test.gif</A>
) and 8080 for httpd_perl (e.g. <A
HREF="http://www.nowhere.com:8080/perl/test.pl">http://www.nowhere.com:8080/perl/test.pl</A>
). Note that I did not write <A
HREF="http://www.nowhere.com:80">http://www.nowhere.com:80</A> for the
first example, since port 80 is a default http port. (Note that later on, I
will be moving the httpd_docs server to port 81.)
<P>
Now I am going to convince you that you _want_ to use a proxy server (in
the http accelerator mode). The reasons are:
<UL>
<P><LI>
<P>
Allow serving of static objects from the proxy's cache (objects that
previously were entirely served by the httpd_docs server).
<P>
You get less I/O activity to read the object from the disk (proxy serves
the most ``popular'' objects from the RAM memory - of course you benefit
more if you allow the proxy server to consume more RAM). Since you do not
wait for the I/O to be completed you serve the static objects faster. (I/O
+1 points, faster static object serving +1 point)
<P><LI>
<P>
The proxy server acts as a sort of output buffer for the dynamic content.
The mod_perl server sends the entire response to the proxy and is then free
to deal with other requests. The proxy server is responsible for getting
the response to the browser. So if the transfer is over a slow link, the
mod_perl server is not waiting around for the data to move.
<P>
Using numbers is always more convincing :) Let's take a user connected to
your site with 28.8 kbps (bps == bits/sec) modem. It means that a speed of
user's link is 28.8/8 = 3.6 kbytes/sec. I assume an average generated HTML
page to be of 10kb (kb == kilobytes) and pretty average script that
generates this output in 0.5 secs. How much time the server will wait
before user will get the whole output down his way? A simple calculation
reveals pretty scaring numbers - it will have to wait for another 6 secs
(20kb/3.6kb), when it could serve another 12 (6/0.5) requests at the time
it was stalled. This very simple example shows us that we need 12 times
less children running, which means you will need 12 fold times less memory
(which is not quite true because some parts of the code are being shared).
But you know that nowadays scripts return pages which sometimes are being
blown up with javascript code and similar, which makes them of 100kb size
and download time to be of... (This calculation was left to the reader as
an exercise :)
<P>
To make your numbers of download time even worse let me remind you that
many users like to open many browser windows and do many things at once
(download files and visit _heavy_ sites). So the speed of 3.6kb/sec I was
talking about before, many times 5-10 times smaller.
<P>
(+3 point for memory cut up, I give it 3 times more points for the savings
we get with this most important functionality)
<P><LI>
<P>
Also we are going to hide the details of the server's implementation. Users
will never see ports in the URLs (more on that topic later). And you can
have a few boxes serving the requests, and only one serves as a front end,
which spreads the jobs between the servers in a way you configured it too.
So you can actually put down one server down for upgrade, but end user will
never notice that because the front end server will dispatch the jobs to
other servers. (of course this is pretty big topic, and it would not be
included in the scope of this document)
<P>
(+1 point for flexibility to change things at the background and making
user to bookmark only one URL)
<P><LI>
<P>
Of course there are drawbacks. Luckily, these are not functionality
drawbacks, but more of administration hassle. Yes, you add another program
to worry about, while proxies are generally stable you have to make sure to
prepare proper startup and shutdown scripts, which are being run at the
boot and reboot appropriately. May be a watchdog script running at the
crontab.
<P>
Proxy servers can be configured to be light or heavy, admin must decide
what gives the highest performance for his application. Proxy server like
squid is light in the concept of having only one process serving all
requests. But it can appear pretty heavy when it loads objects into memory
for faster serving.
<P>
(-1 point for administration overhead, very small but overhead. -1 point
for memory consuming)
</UL>
<P>
So let's sum up the points. We have 4 (6-2) points for the proxy server and
I believe I have succeeded to convince you, that you want it :)
<P>
If you are on the local network, then the big benefit of proxy buffering
the output and feeding a slow client is gone. You are probably better off
sticking with a straight mod_perl server in this case.
<P>
As of this writing 2 proxy implementations are known to be used in bundle
with mod_perl - <STRONG>squid</STRONG> proxy server and <STRONG>mod_proxy</STRONG> which is a part of the apache server. Let's compare the 2 of them.
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="squid_server">squid server</A></H1></CENTER>
<P>
The Advantages:
<UL>
<P><LI>
<P>
Caching of static objects. So these are being served much faster taking
that your cache size is big enough to keep the most requested objects in
the cache.
<P><LI>
<P>
Buffering of dynamic content, by taking the burden of returning the
generated by mod_perl server content to the slow clients, thus freeing
mod_perl servers from wasteful waiting for the slow clients to download the
data. Freed servers immediately switch to serve other requests, thus your
number of required servers goes drastically down.
<P><LI>
<P>
Non-linear URL space / server setup. You can use Squid to play some tricks
with the URL space and/or domain based virtual server support.
</UL>
<P>
The Disadvantages:
<UL>
<P><LI>
<P>
Proxying dynamic content is not going to help much if all the clients are
on a fast local net. Also, a message on the squid mailing list implied that
squid only buffers in 16k chunks so it would not allow a mod_perl to
complete immediately if the output is larger.
<P><LI>
<P>
Speed. Squid is not very fast today when compared to plain file based web
servers available. Only if you are using a lot of dynamic features such as
mod_perl or similar speed is a reason to use Squid, and then only if the
application and server is designed with caching in mind.
<P><LI>
<P>
Memory usage. Squid uses quite a bit of memory.
<P><LI>
<P>
HTTP protocol level. Squid is pretty much a HTTP/1.0 server, which
seriously limits the deployment of HTTP/1.1 features.
<P><LI>
<P>
HTTP headers / dates, freshness. Your server will be giving out ``old''
pages, which might confuse downstream/client caches. Also chances are that
you will be giving out stale pages.
<P><LI>
<P>
Stability. Compared to plain web servers Squid is not the most stable.
</UL>
<P>
The presented pros and cons lead to an idea, that probably you might want
squid more for its dynamic content buffering features, if your server
serves mostly dynamic requests. So in this situation it is better to have a
plain apache server serving static objects, and squid proxying the mod_perl
enabled server only. At least when performance is the goal.
<P>
For implementation details see: <A HREF="././scenario.html#Running_1_webserver_and_squid_in">Running 1 webserver and squid in httpd accelerator mode</A> and
<A HREF="././scenario.html#Running_2_webservers_and_squid_i">Running 2 webservers and squid in httpd accelerator mode</A>
<P>
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
<CENTER><H1><A NAME="apache_s_mod_proxy">apache's mod_proxy</A></H1></CENTER>
<P>
I do not think the difference in speed between apache's <CODE>ProxyPass</CODE>
and squid is relevant for most sites, since the real value of what they do
is buffering for slow client connections. However squid runs as a single
process and probably consumes fewer system resources. The trade-off is that
mod_rewrite is easy to use if you want to spread parts of the site across
different back end servers, and mod_proxy knows how to fix up redirects
containing the back-end server's idea of the location. With squid you can
run a redirector process to proxy to more than one back end, but there is a
problem to fix redirects in a way that keeps the client's view of both
server names and port numbers in all cases. The difficult case being where
you have DNS aliases that map to the same IP address for an alias and you
want the redirect to use port 80 (when the server is really on a different
port) but you want it to keep the specific name the browser sent so it does
not change in the client's 'location' window.
<P>
The Advantages:
<UL>
<P><LI>
<P>
No additional server is needed. We keep the 1 plain plus 1 mod_perl enabled
apache servers. All you need is to enable the <CODE>mod_proxy</CODE> in
<CODE>httpd_docs</CODE> and add a few lines to <CODE>httpd.conf</CODE> file.
<P><LI>
<P>
<CODE>ProxyPass</CODE> and <CODE>ProxyPassReverse</CODE> directives allows you to hide the internal redirects, so if <CODE>http://nowhere.com/modperl/</CODE> is actually
<CODE>http://localhost:81/modperl/</CODE>, it will be absolutely transparent for user. <CODE>ProxyPass</CODE> redirects the request to the mod_perl server, and when it gets the respond, <CODE>ProxyPassReverse</CODE> rewrites the URL back to the original one, e.g:
<P>
<PRE> ProxyPass /modperl/ <A HREF="http://localhost:81/modperl/">http://localhost:81/modperl/</A>
ProxyPassReverse /modperl/ <A HREF="http://localhost:81/modperl/">http://localhost:81/modperl/</A>
</PRE>
<P><LI>
</UL>
<P>
The Disadvantages:
<UL>
<P><LI>
</UL>
<P>
For implementation see <A HREF="././scenario.html#Using_mod_proxy">Using mod_proxy</A>.
<P><B><FONT SIZE=-1><A HREF="#toc">[TOC]</A></FONT></B><HR WIDTH="100%"></P>
The <a href="http://www.modperl.com/">
<B>Writing Apache Modules with Perl and C</B></a>
book can be purchased online from <a
href="http://www.ora.com/catalog/wrapmod/">O'Reilly </a>
and <a
href="http://www.amazon.com/exec/obidos/ASIN/156592567X/writinapachemodu">
Amazon.com</a>.
<HR>
[ <A HREF="start.html">Prev</A> | <A HREF="index.html">Main Page</A> | <A HREF="scenario.html">Next</A> ]
<CENTER><TABLE CELLSPACING=2 CELLPADDING=2 WIDTH="100%" >
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER COLSPAN="3">
<HR>
</TD>
</TR>
<TR ALIGN=CENTER VALIGN=TOP>
<TD ALIGN=CENTER VALIGN=CENTER>
<B>
<FONT SIZE=-1>
Written by <A HREF="help.html#This_document_s_Author">Stas Bekman</A>.
<BR>Last Modified at 05/08/1999
</FONT>
</B>
</TD>
<TD>
<A HREF="http://perl.apache.org"><IMG SRC="images/mod_perl2.jpg" ALT="Mod Perl Icon" BORDER=0 HEIGHT=59 WIDTH=150></A>
</TD>
<TD>
<FONT SIZE=-2>
Use of the Camel for Perl is <BR>
a trademark of <A HREF="http://www.ora.com">O'Reilly & Associates</A>,<BR>
and is used by permission.
</FONT>
</TD>
</TR>
</TABLE></CENTER>
</BODY>
</HTML>