You are viewing a plain text version of this content. The canonical link for it is here.

Posted to modperl@perl.apache.org by Stas Bekman <sb...@stason.org> on 2000/06/03 02:54:37 UTC

[RFC: performance] Initializing DBI.pm

Here is a complete version. comments are very welcome before it enters the
guide:

The first example is the C<DBI> module. As you know C<DBI> works with
many database drivers falling into the C<DBD::> category,
e.g. C<DBD::mysql>. It's not enough to preload C<DBI>, you should
initialize C<DBI> with driver(s) that you are going to use (usually a
single driver is used).

You probably know already that under mod_perl you should use the
C<Apache::DBI> module to get the connection persistence, unless you
open a separate connection for each user--in this case you should not
use this module. C<Apache::DBI> automatically loads C<DBI> and
overrides all it's methods, so you should continue coding like there
is only a C<DBI> module.

Just as with modules preloading our goal is to find the startup
environment that will lead to the smallest I<"difference"> between the
shared and normal memory reported, therefore a smaller total memory
usage.

And again in order to have an easy measurement we will use only one
child process, therefore we will use this setting in I<httpd.conf>:

  MinSpareServers 1
  MaxSpareServers 1
  StartServers 1
  MaxClients 1
  MaxRequestsPerChild 100

We are going to run memory benchmarks on five different versions of
the I<startup.pl> file.  We always preload these modules:

  use Gtop();
  use Apache::DBI(); # preloads DBI as well

=over

=item option 1

Leave the file unmodified.

=item option 2

Install MySQL driver (we will use MySQL RDBMS for our test):

  DBI->install_driver("mysql");

=item option 3

Preload MySQL driver module:

  use DBD::mysql;

=item option 4

Tell Apache::DBI to connect to the database when the child process
starts (ChildInitHandler), no driver is preload before the child gets
spawned!

  Apache::DBI->connect_on_init('DBI:mysql:test::localhost',
                             "",
                             "",
                             {
                              PrintError => 1, # warn() on errors
                              RaiseError => 0, # don't die on error
                              AutoCommit => 1, # commit executes
                              # immediately
                             }
                            )
  or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");

=back

Here is the C<Apache::Registry> test script that we have used:

  preload_dbi.pl
  --------------
  use strict;
  use GTop ();
  use DBI ();
    
  my $dbh = DBI->connect("DBI:mysql:test::localhost",
                         "",
                         "",
                         {
                          PrintError => 1, # warn() on errors
                          RaiseError => 0, # don't die on error
                          AutoCommit => 1, # commit executes
                                           # immediately
                         }
                        )
    or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
  
  my $r = shift;
  $r->send_http_header('text/plain');
  
  my $do_sql = "show tables";
  my $sth = $dbh->prepare($do_sql);
  $sth->execute();
  my @data = ();
  while (my @row = $sth->fetchrow_array){
    push @data, @row;
  }
  print "Data: @data\n";
  $dbh->disconnect(); # NOP under Apache::DBI
  
  my $proc_mem = GTop->new->proc_mem($$);
  my $size  = $proc_mem->size;
  my $share = $proc_mem->share;
  my $diff  = $size - $share;
  printf "%8s %8s %8s\n", qw(Size Shared Diff);
  printf "%8d %8d %8d (bytes)\n",$size,$share,$diff;

What it does is opening a connection to the database I<'test'> and
issues a query to learn what tables the databases has.  When the data
is collected and printed the connection would be closed in the regular
case, but C<Apache::DBI> overrides it with empty method.  When the
data is processed a familiar to you already code to print the memory
usage follows.

The server was restarted before each new test.

So here are the results of the five tests that were conducted, sorted
by the I<Diff> column:

=over

=item 1

After the first request:

  Version     Size   Shared     Diff        Test type
  --------------------------------------------------------------------
        1  3465216  2621440   843776  install_driver
        2  3461120  2609152   851968  install_driver & connect_on_init
        3  3465216  2605056   860160  preload driver
        4  3461120  2494464   966656  nothing added
        5  3461120  2482176   978944  connect_on_init

=item 2

After the second request (all the subsequent request showed the same
results):

  Version     Size   Shared    Diff         Test type
  --------------------------------------------------------------------
        1  3469312  2609152   860160  install_driver
        2  3481600  2605056   876544  install_driver & connect_on_init
        3  3469312  2588672   880640  preload driver
        4  3477504  2482176   995328  nothing added
        5  3481600  2469888  1011712  connect_on_init

=back

Now what do we conclude from looking at these numbers. First we see
that only after a second reload we get the final memory footprint for
a specific request in question (if you pass different arguments the
memory usage might and will be different).

But both tables show the same pattern of memory usage.  We can clearly
see that the real winner is the I<startup.pl> file's version where the
MySQL driver was installed (1).  Since we want to have a connection
ready for the first request made to the freshly spawned child process,
we generally use the second version (2) which uses somewhat more
memory, but has almost the same number of shared memory pages.  The
third version only preloads the driver which results in smaller shared
memory.  The last two versions having nothing initialized (4) and
having only the connect_on_init() method used (5).  The former is a
little bit better than the latter, but both significantly worse than
the first two versions.

To remind you why do we look for the smallest value in the column
I<diff>, recall the real memory usage formula:

  RAM_dedicated_to_mod_perl = diff * number_of_processes
                            + the_processes_with_largest_shared_memory

Notice that the the smaller the diff is, the bigger the number of
processes you can have using the same amount of RAM.  Therefore every
100K difference counts, when you multiply it by the number of
processes. If we take the number from the version version (1) vs (4)
and assume that we have 256M of memory dedicated to mod_perl processes
we will get the following numbers using the formula derived from the
above formula:

               RAM - largest_shared_size
  N_of Procs = -------------------------
                        Diff

                268435456 - 2609152
  (ver 1)  N =  ------------------- = 309
                      860160

                268435456 - 2469888
  (ver 5)  N =  ------------------- = 262
                     1011712

So you can tell the difference (17% more child processes in the first
version).


_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org   http://perl.org     http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org

Re: [RFC: performance] Initializing DBI.pm

Posted by Eric Cholet <ch...@logilune.com>.

> > > I've not done much of either this last year, however, I'm hoping to
get
> > > a new beta DBI release out this week. Maybe...
> >
> > Tim I hope you plan to integrate Doug's patch which makes it possible to
use
> > DBI with Perl 5.6 -Dusethreads. Thanks!
>
> Of course. And I'll trust you'll all be doing my testing for me... :-)

Sure thing, although I doubt many mod_perl users are using Perl
5.6 -Dusethreads
in production environments.

--
Eric

Re: [RFC: performance] Initializing DBI.pm

Posted by Tim Bunce <Ti...@ig.co.uk>.

On Mon, Jun 05, 2000 at 12:49:46AM +0200, Eric Cholet wrote:
> On Sun, Jun 04, 2000 at 08:58:11PM +0100, Tim Bunce wrote:
> > On Sun, Jun 04, 2000 at 10:57:57PM +0300, Stas Bekman wrote:
> > > 
> > > This all won't be possible without you and other great folks writing and
> > > maintaining this amaizing software... So the biggest thanks goes to you :) 
> > 
> > I've not done much of either this last year, however, I'm hoping to get
> > a new beta DBI release out this week. Maybe...
> 
> Tim I hope you plan to integrate Doug's patch which makes it possible to use
> DBI with Perl 5.6 -Dusethreads. Thanks!

Of course. And I'll trust you'll all be doing my testing for me... :-)

Tim.

Re: [RFC: performance] Initializing DBI.pm

Posted by Eric Cholet <ch...@logilune.com>.

On Sun, Jun 04, 2000 at 08:58:11PM +0100, Tim Bunce wrote:
> On Sun, Jun 04, 2000 at 10:57:57PM +0300, Stas Bekman wrote:
> > 
> > This all won't be possible without you and other great folks writing and
> > maintaining this amaizing software... So the biggest thanks goes to you :) 
> 
> I've not done much of either this last year, however, I'm hoping to get
> a new beta DBI release out this week. Maybe...

Tim I hope you plan to integrate Doug's patch which makes it possible to use
DBI with Perl 5.6 -Dusethreads. Thanks!

> 
> Tim.
> 

-- 
Eric Cholet

Re: [RFC: performance] Initializing DBI.pm

Posted by Tim Bunce <Ti...@ig.co.uk>.

On Sun, Jun 04, 2000 at 10:57:57PM +0300, Stas Bekman wrote:
> 
> This all won't be possible without you and other great folks writing and
> maintaining this amaizing software... So the biggest thanks goes to you :) 

I've not done much of either this last year, however, I'm hoping to get
a new beta DBI release out this week. Maybe...

Tim.

Re: [RFC: performance] Initializing DBI.pm

Posted by Stas Bekman <sb...@stason.org>.

On Sun, 4 Jun 2000, Tim Bunce wrote:

> On Sat, Jun 03, 2000 at 03:54:37AM +0300, Stas Bekman wrote:
> > Here is a complete version. comments are very welcome before it enters the
> > guide:
> > 
> > The first example is the C<DBI> module. As you know C<DBI> works with
> > many database drivers falling into the C<DBD::> category,
> > e.g. C<DBD::mysql>. It's not enough to preload C<DBI>, you should
> > initialize C<DBI> with driver(s) that you are going to use (usually a
> > single driver is used).
> 
> ... if you want to minimize memory use after forking.
> 
> I'd rather not create the impression that people "should" initialize
> drivers in other circumstances.

Perfect! Will correct this. Thanks!

> > You probably know already that under mod_perl you should use the
> > C<Apache::DBI> module to get the connection persistence, unless you
> > open a separate connection for each user--in this case you should not
> > use this module. C<Apache::DBI> automatically loads C<DBI> and
> > overrides all it's methods, so you should continue coding like there
> > is only a C<DBI> module.
> 
> s/all it's methods/some of its methods/.

right!

> > =item option 4
> > 
> > Tell Apache::DBI to connect to the database when the child process
> > starts (ChildInitHandler), no driver is preload before the child gets
> > spawned!
> > 
> >   Apache::DBI->connect_on_init('DBI:mysql:test::localhost',
> >                              "",
> >                              "",
> >                              {
> >                               PrintError => 1, # warn() on errors
> >                               RaiseError => 0, # don't die on error
> >                               AutoCommit => 1, # commit executes
> >                               # immediately
> >                              }
> >                             )
> >   or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");
> 
> There's no DBI->disconnect method. Just die().

ok

> Thanks for doing all this work Stas. Much appreciated.

Thanks :) 

This all won't be possible without you and other great folks writing and
maintaining this amaizing software... So the biggest thanks goes to you :) 

_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org   http://perl.org     http://stason.org/TULARC
http://singlesheaven.com http://perlmonth.com http://sourcegarden.org

Re: [RFC: performance] Initializing DBI.pm

Posted by Tim Bunce <Ti...@ig.co.uk>.

On Sat, Jun 03, 2000 at 03:54:37AM +0300, Stas Bekman wrote:
> Here is a complete version. comments are very welcome before it enters the
> guide:
> 
> The first example is the C<DBI> module. As you know C<DBI> works with
> many database drivers falling into the C<DBD::> category,
> e.g. C<DBD::mysql>. It's not enough to preload C<DBI>, you should
> initialize C<DBI> with driver(s) that you are going to use (usually a
> single driver is used).

... if you want to minimize memory use after forking.

I'd rather not create the impression that people "should" initialize
drivers in other circumstances.

> You probably know already that under mod_perl you should use the
> C<Apache::DBI> module to get the connection persistence, unless you
> open a separate connection for each user--in this case you should not
> use this module. C<Apache::DBI> automatically loads C<DBI> and
> overrides all it's methods, so you should continue coding like there
> is only a C<DBI> module.

s/all it's methods/some of its methods/.

> =item option 4
> 
> Tell Apache::DBI to connect to the database when the child process
> starts (ChildInitHandler), no driver is preload before the child gets
> spawned!
> 
>   Apache::DBI->connect_on_init('DBI:mysql:test::localhost',
>                              "",
>                              "",
>                              {
>                               PrintError => 1, # warn() on errors
>                               RaiseError => 0, # don't die on error
>                               AutoCommit => 1, # commit executes
>                               # immediately
>                              }
>                             )
>   or DBI->disconnect("Cannot connect to database: $DBI::errstr\n");

There's no DBI->disconnect method. Just die().

Thanks for doing all this work Stas. Much appreciated.

Tim.