You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Torsten Foertsch <to...@gmx.net> on 2005/11/30 11:35:17 UTC

Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Hi,

I have just uploaded Apache::DBI::Cache to cpan. The module aims at the same 
problems as Apache::DBI but works different.

While Apache::DBI caches connections at connect time this module caches them 
only at disconnect or DESTROY.

Apache::DBI does not distinguish between currently used and free connections. 
Hence, it cannot support multiple identical connections. This module does. 

Apache::DBI resets all connections at request cleanup. Apache::DBI::Cache 
intercepts disconnect or DESTROY events to do that.

As does Apache::DBI it works with mp1 and mp2. It also provides handle 
statistics via Apache::Status respectively Apache2::Status. 
Apache::DBI::Cache can use BerkeleyDB as a shared memory implementation to 
provide statistics for the whole Apache process group instead of a single 
process.

Apache::DBI::Cache includes the DBD driver name in the caching key while 
Apache::DBI does not. Hence with Apache::DBI the following 2 DSNs can result 
in the same DBI handle: dbi:mysql:dbname=db and dbi:Pg:dbname=db

Apache::DBI::Cache has a plugin interface to treat different database types 
specially.

I wrote this module because Apache::DBI had changed the logic of our programs. 
Further, we had really much DSNs mostly MySQL in various configuration files 
all using different syntaxes to connect to a dozen databases on 2 database 
hosts. With the Apache::DBI::Cache and the mysql plugin we could strip the 
number of connections down to a handful of permanent connections per Apache 
process.

There are still cases when Apache::DBI::Cache changes the logic of your 
program but less than with Apache::DBI.


Torsten


----------  Forwarded Message  ----------

Subject: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz
Date: Wednesday 30 November 2005 10:58
From: PAUSE <up...@pause.perl.org>
To: torsten.foertsch@gmx.net, andreas.koenig@pause.perl.org

The following report has been written by the PAUSE namespace indexer.
Please contact modules@perl.org if there are any open questions.
  Id: mldistwatch 676 2005-11-24 20:38:34Z k

               User: OPI (Torsten Foertsch)
  Distribution file: Apache-DBI-Cache-0.06.tar.gz
    Number of files: 14
         *.pm files: 2
             README: Apache-DBI-Cache-0.06/README
           META.yml: Apache-DBI-Cache-0.06/META.yml
  Timestamp of file: Wed Nov 30 09:33:38 2005 UTC
   Time of this run: Wed Nov 30 09:58:38 2005 UTC

The following packages (grouped by status) have been found in the distro:

Status: Successfully indexed
============================

     module: Apache::DBI::Cache
    version: 0.06
    in file: Apache-DBI-Cache-0.06/lib/Apache/DBI/Cache.pm
     status: indexed

     module: Apache::DBI::Cache::db
    version: 0.06
    in file: Apache-DBI-Cache-0.06/lib/Apache/DBI/Cache.pm
     status: indexed

     module: Apache::DBI::Cache::mysql
    version: 0.05
    in file: Apache-DBI-Cache-0.06/lib/Apache/DBI/Cache/mysql.pm
     status: indexed

__END__

-------------------------------------------------------

Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Perrin Harkins <pe...@elem.com>.
On Mon, 2005-12-05 at 20:44 +0100, Torsten Foertsch wrote:
> I would not say that. DESTROY is reliable in that it does exactly what it 
> should. It is called when the last reference to an object goes away.

The issue is that things don't always go out of scope when people think
they do.  We've all seen this with Apache::Session problems and I've
also seen it with Class::DBI using weak references to keep an "object
index."  People are often surprised to discover they have done something
that prevents an object from going out of scope.  It's just a side
effect of Perl's complexity.

> See above, it was not my goal to make an application better than it is. If it 
> was developed with global handles, well ... so be it.

This is different from Apache::DBI though.  With Apache::DBI, the
rollback will get called as long as something called connect() for that
handle during the request.  With your module, if disconnect() and
DESTROY don't happen (because something died, but the handle didn't go
out of scope for some reason), no rollback will happen.  It probably
seems like more of an issue to me than to you because you trust DESTROY
to happen and I don't.

> You mean $dbh->{$PRIVATE} is wrong? Maybe because $dbh->{$PRIVATE}||=... would 
> not work? That has been avoided in the code. What else is wrong with that?

In general, it's not a good idea to access the internals of someone
else's class implementation.  DBI does allow it, but has certain rules
about namespaces.  See the section "General Interface Rules & Caveats"
in the DBI docs.  There are also other ways to store private attributes
without touching the object itself, like making your own container
object to hold the handle and the metadata.

- Perrin


Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Torsten Foertsch <to...@gmx.net>.
On Monday 05 December 2005 18:57, Perrin Harkins wrote:
> On Mon, 2005-12-05 at 17:51 +0100, Torsten Foertsch wrote:
> > With Apache::DBI::Cache on the other hand handles are cached only when
> > they are free.
>
> Now I understand -- you are using the cache as a way to mark unused
> handles.  This is kind of confusing.  It would be easier to understand
> if you always kept them in the cache and just have a "in_use" attribute
> that you set for each one or something similar.  In fact you already
> seem to have one with your "disconnected" attribute.

I cannot cache the handle on connect. Since then it would never be DESTROYed.
The "disconnected" attribute is used to prevent double disconnect (to keep 
statistics correct).

> You actually could do all of this with a wrapper around Apache::DBI.  It
> could keep  track of in-use handles and create new ones when needed by
> adjusting a dummy attribute.

Yes, but then again I won't catch the DESTROY event. This leads to the request 
cleanup handler what I would like to prevent.

> > There are 2 occasions when a handle can go out of use. Firstly,
> > when C<disconnect> is called or when the handle is simply forgotten. The
> > second event can be caught with a C<DESTROY> method.
>
> DESTROY is unreliable.  Scoping in Perl is extremely complicated and
> modules like Apache::Session that rely on DESTROY for anything are a
> source of constant problems on this list.  People accidentally create
> closures, accidentally return the object into a larger scope that keeps
> it around longer, put it in global variables, etc.  I would avoid this.

I would not say that. DESTROY is reliable in that it does exactly what it 
should. It is called when the last reference to an object goes away.
And as you said, DESTROY is used only a last resort to put a handle back into 
the cache. Normally, disconnect would be called.
The module was developed to be less invasive than Apache::DBI. When an 
application runs without Apache::DBI and without Apache::DBI::Cache and there 
are closures that prevent handles from beeing forgotten then with 
Apache::DBI::Cache that should remain the same. On the server it was first 
used there where a lot of singleton DBI connections stored in global 
variables. In some cases resuing them for anything else led to errors. (I 
don't know why.)
If you need to store handles in global variables you can try 
C<undef_at_request_cleanup> to put them back into the cache at request 
cleanup. Here the PerlCleanupHandler is back, ;-). If it works, ok, if not go 
and use the global handle.

> > Now you can have as much identical connections to a DB server as you
> > need. For example you can connect 2 times with AutoCommit=>1 then start a
> > transaction on one handle and use the second for lookups.
>
> This sounds like a bad idea to me, since the second one won't be able to
> see things added by the first one.  There may be some other useful case
> for this though.

That example was used as an example. And in fact it can be useful. I have seen 
applications where for each month a new set of tables was created. The fact 
that a table did not exist simply meant 0 for each of it's columns. If a 
select had to check something for a particular range of months then some 
tables could not exist. Within a transaction that would cause the whole 
transaction to abort.

> The only serious issue I see with this module is the way you handle
> rollbacks.  This will only do a rollback if you call disconnect.  What
> happens if your code hits an unexpected error and dies without calling
> disconnect?  No rollback, and potentially a transaction left open with
> questionable data and possibly locks.  (You can't rely on the object
> going out of scope for safety.)  Apache::DBI prevents this with its
> cleanup handler, although that is somewhat flawed as well if you connect
> with AutoCommit on and then turn it off.

See above, it was not my goal to make an application better than it is. If it 
was developed with global handles, well ... so be it. Oh, I forgot to say the 
module was not developed with Registry scripts in mind. I had originally a 
bunch of handcrafted modperl applications that created handles, disconnected 
them arbitrarily. Some used singletons others connect/disconnect for each 
request. That led to 2 problems a) the total amount of connections to some 
mysql databases was quite great (several thousand) and b) the frequent 
connect calls led to problems on a DNS server (as I was told).

> Hmm... It also does direct hash accesses on the $dbh object for storing
> private data.  That's a little scary.  The $dbh->{AutoCommit} stuff in
> DBI is special because it uses XS typeglob magic.  Doing your own hash
> accesses is not really safe.

You mean $dbh->{$PRIVATE} is wrong? Maybe because $dbh->{$PRIVATE}||=... would 
not work? That has been avoided in the code. What else is wrong with that? 
And how can it be cirumvented?

Thanks, Perrin, for reviewing my code,
Torsten

Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Perrin Harkins <pe...@elem.com>.
On Mon, 2005-12-05 at 17:51 +0100, Torsten Foertsch wrote:
> With Apache::DBI::Cache on the other hand handles are cached only when they 
> are free.

Now I understand -- you are using the cache as a way to mark unused
handles.  This is kind of confusing.  It would be easier to understand
if you always kept them in the cache and just have a "in_use" attribute
that you set for each one or something similar.  In fact you already
seem to have one with your "disconnected" attribute.

You actually could do all of this with a wrapper around Apache::DBI.  It
could keep  track of in-use handles and create new ones when needed by
adjusting a dummy attribute.

> There are 2 occasions when a handle can go out of use. Firstly, 
> when C<disconnect> is called or when the handle is simply forgotten. The 
> second event can be caught with a C<DESTROY> method.

DESTROY is unreliable.  Scoping in Perl is extremely complicated and
modules like Apache::Session that rely on DESTROY for anything are a
source of constant problems on this list.  People accidentally create
closures, accidentally return the object into a larger scope that keeps
it around longer, put it in global variables, etc.  I would avoid this.

> Now you can have as much identical connections to a DB server as you need. For 
> example you can connect 2 times with AutoCommit=>1 then start a transaction 
> on one handle and use the second for lookups.

This sounds like a bad idea to me, since the second one won't be able to
see things added by the first one.  There may be some other useful case
for this though.

> My module cannot provide C<all_handlers> since it does not know them all.

I think DBI will provide something equivalent soon.

The only serious issue I see with this module is the way you handle
rollbacks.  This will only do a rollback if you call disconnect.  What
happens if your code hits an unexpected error and dies without calling
disconnect?  No rollback, and potentially a transaction left open with
questionable data and possibly locks.  (You can't rely on the object
going out of scope for safety.)  Apache::DBI prevents this with its
cleanup handler, although that is somewhat flawed as well if you connect
with AutoCommit on and then turn it off.

Hmm... It also does direct hash accesses on the $dbh object for storing
private data.  That's a little scary.  The $dbh->{AutoCommit} stuff in
DBI is special because it uses XS typeglob magic.  Doing your own hash
accesses is not really safe.

- Perrin



Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Torsten Foertsch <to...@gmx.net>.
Hi,

sorry for the late answer.

I'll try to explain what Apache::DBI and my module does. Maybe it will then be 
clear why my module cannot be compatible with Apache::DBI and why it 
therefore must be an extra module. In fact I thought at first to create a 
quick patch to Apache::DBI. It would have saved a lot of work.

With Apache::DBI when DBI->connect is called the first time for a given DSN an 
attributes a new DBI handle is created and stored in %Connected. When 
DBI->connect is called a second time with the same parameters Apache::DBI 
finds a matching handle in %Connected and reuses it. Additionally, at request 
cleanup time Apache::DBI checks all handles created with AutoCommit=>0 and 
rolls a possibly pending transaction back.

From this logic there can only be one handle for a given set of attributes 
(and DSN). And this makes sense for Apache::DBI because there is no notion 
whether a handle is really in use or not.

Further, Apache::DBI provides a C<all_handlers> subroutine that allows code 
outside Apache::DBI to access %Connected and thus to all cached handles.

With Apache::DBI::Cache on the other hand handles are cached only when they 
are free. When DBI->connect is called the module tries to find a matching 
free one in the cache. If that fails a new one is created. Then the handle is 
returned to the caller and Apache::DBI::Cache do NOT hold any references to 
the handle. There are 2 occasions when a handle can go out of use. Firstly, 
when C<disconnect> is called or when the handle is simply forgotten. The 
second event can be caught with a C<DESTROY> method. Apache::DBI::Cache 
catches these 2 events and puts the handle in the cache preventing it from 
really get disconnected. In the DESTROY case it adds a reference and the 
handle will not be touched by the garbage collector.

Now you can have as much identical connections to a DB server as you need. For 
example you can connect 2 times with AutoCommit=>1 then start a transaction 
on one handle and use the second for lookups. When the lookup fails (missing 
table, missing column etc.) it does not disturb your transaction. Of course 
by adding a dummy attribute this could also be achieved with Apache::DBI. But 
suppose a big installation with quite a number of independent developers. 
Dummy attributes can be quite error-prone.

Further, maybe you want to do something even a transaction over the lifetime 
of a connection. This cannot be done with Apache::DBI. (I don't say that 
would be good practice.)

My module cannot provide C<all_handlers> since it does not know them all.

Torsten

On Sunday 04 December 2005 10:45, Enrico Sorcinelli wrote:
> On Wed, 30 Nov 2005 13:37:56 -0500
>
> Perrin Harkins <pe...@elem.com> wrote:
> > Hi Torsten,
> >
> > A few comments on the new module:
> > > While Apache::DBI caches connections at connect time this module caches
> > > them only at disconnect or DESTROY.
> >
> > Why?  I don't understand the value in doing this.
> >
> > > Apache::DBI does not distinguish between currently used and free
> > > connections. Hence, it cannot support multiple identical connections.
> > > This module does.
> >
> > To get multiple connections to the same database with Apache::DBI, you
> > just need to add something unique to the attributes hash in your connect
> > string.
> >
> > > Apache::DBI resets all connections at request cleanup.
> > > Apache::DBI::Cache intercepts disconnect or DESTROY events to do that.
> >
> > For the rollback you mean?  That's not good.  The purpose of the
> > automatic rollback in Apache::DBI is to reset the connection when your
> > code dies due to a bug.  There won't be any disconnect or DESTROY called
> > in that situation.
> >
> > > Apache::DBI::Cache includes the DBD driver name in the caching key
> > > while Apache::DBI does not. Hence with Apache::DBI the following 2 DSNs
> > > can result in the same DBI handle: dbi:mysql:dbname=db and
> > > dbi:Pg:dbname=db
> >
> > Sounds like a good idea.
> >
> > > I wrote this module because Apache::DBI had changed the logic of our
> > > programs.
> >
> > How so?
> >
> > > Further, we had really much DSNs mostly MySQL in various configuration
> > > files all using different syntaxes to connect to a dozen databases on 2
> > > database hosts.
> >
> > I think the normalization of connect strings is a good idea.  It could
> > be useful on its own, for people who don't use mod_perl.
> >
> > - Perrin
>
> Hi Torsten,
>
> as Perrin says, your module has some nice things and some that I don't
> understand too much me too.
>
> Apache::DBI is a well known and famous module largely used by mod_perl
> users. IMHO, trying to improve that instead of writing a new one would have
> been the better thing for the mod_perl community.
>
> Have you contacted Apache::DBI maintainer in order to integrate your work
> with that module?
>
> by
>
> 	- Enrico

Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Enrico Sorcinelli <be...@perl.it>.
On Mon, 5 Dec 2005 11:47:59 +0100
Frank Maas <fr...@cheiron-it.nl> wrote:

> On Sun, Dec 04, 2005 at 10:45:03AM +0100, Enrico Sorcinelli wrote:
> > 
> > Have you contacted Apache::DBI maintainer in order to integrate your work with
> > that module?
> 
> I have the distinct impression that the maintainer is too busy to respond. 
> If I am not mistaken Philippe has tried this with no result. I as well 
> have approached Ask with no reply whatsoever. Please don't take this as an 
> impolite comment, just an observation.

Hi Frank,

on 20 Aug 2005, Philip M. Gollucci take over the maintenance of Apache::DBI :-)


	- Enrico

Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Frank Maas <fr...@cheiron-it.nl>.
On Sun, Dec 04, 2005 at 10:45:03AM +0100, Enrico Sorcinelli wrote:
> 
> Have you contacted Apache::DBI maintainer in order to integrate your work with
> that module?

I have the distinct impression that the maintainer is too busy to respond. 
If I am not mistaken Philippe has tried this with no result. I as well 
have approached Ask with no reply whatsoever. Please don't take this as an 
impolite comment, just an observation.

Regards,

Frank


Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Enrico Sorcinelli <be...@perl.it>.
On Wed, 30 Nov 2005 13:37:56 -0500
Perrin Harkins <pe...@elem.com> wrote:

> Hi Torsten,
> 
> A few comments on the new module:
> 
> > While Apache::DBI caches connections at connect time this module caches them 
> > only at disconnect or DESTROY.
> 
> Why?  I don't understand the value in doing this.
> 
> > Apache::DBI does not distinguish between currently used and free connections. 
> > Hence, it cannot support multiple identical connections. This module does.
> 
> To get multiple connections to the same database with Apache::DBI, you
> just need to add something unique to the attributes hash in your connect
> string.
> 
> > Apache::DBI resets all connections at request cleanup. Apache::DBI::Cache 
> > intercepts disconnect or DESTROY events to do that.
> 
> For the rollback you mean?  That's not good.  The purpose of the
> automatic rollback in Apache::DBI is to reset the connection when your
> code dies due to a bug.  There won't be any disconnect or DESTROY called
> in that situation.
> 
> > Apache::DBI::Cache includes the DBD driver name in the caching key while 
> > Apache::DBI does not. Hence with Apache::DBI the following 2 DSNs can result 
> > in the same DBI handle: dbi:mysql:dbname=db and dbi:Pg:dbname=db
> 
> Sounds like a good idea.
> 
> > I wrote this module because Apache::DBI had changed the logic of our programs.
> 
> How so?
>  
> > Further, we had really much DSNs mostly MySQL in various configuration files 
> > all using different syntaxes to connect to a dozen databases on 2 database 
> > hosts.
> 
> I think the normalization of connect strings is a good idea.  It could
> be useful on its own, for people who don't use mod_perl.
> 
> - Perrin


Hi Torsten,

as Perrin says, your module has some nice things and some that I don't
understand too much me too.

Apache::DBI is a well known and famous module largely used by mod_perl users.
IMHO, trying to improve that instead of writing a new one would have been the
better thing for the mod_perl community.

Have you contacted Apache::DBI maintainer in order to integrate your work with
that module?

by

	- Enrico

Re: Fwd: PAUSE indexer report OPI/Apache-DBI-Cache-0.06.tar.gz

Posted by Perrin Harkins <pe...@elem.com>.
Hi Torsten,

A few comments on the new module:

> While Apache::DBI caches connections at connect time this module caches them 
> only at disconnect or DESTROY.

Why?  I don't understand the value in doing this.

> Apache::DBI does not distinguish between currently used and free connections. 
> Hence, it cannot support multiple identical connections. This module does.

To get multiple connections to the same database with Apache::DBI, you
just need to add something unique to the attributes hash in your connect
string.

> Apache::DBI resets all connections at request cleanup. Apache::DBI::Cache 
> intercepts disconnect or DESTROY events to do that.

For the rollback you mean?  That's not good.  The purpose of the
automatic rollback in Apache::DBI is to reset the connection when your
code dies due to a bug.  There won't be any disconnect or DESTROY called
in that situation.

> Apache::DBI::Cache includes the DBD driver name in the caching key while 
> Apache::DBI does not. Hence with Apache::DBI the following 2 DSNs can result 
> in the same DBI handle: dbi:mysql:dbname=db and dbi:Pg:dbname=db

Sounds like a good idea.

> I wrote this module because Apache::DBI had changed the logic of our programs.

How so?
 
> Further, we had really much DSNs mostly MySQL in various configuration files 
> all using different syntaxes to connect to a dozen databases on 2 database 
> hosts.

I think the normalization of connect strings is a good idea.  It could
be useful on its own, for people who don't use mod_perl.

- Perrin