You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Brian Reichert <re...@numachi.com> on 2007/07/29 17:29:37 UTC

mod_perl2 and SDBM-tied hashes

I first posted this on the HTML_Mason mason_users list, as that's
the environment where I first saw this symptom.  I though what I
was seeing was an artifact of Mason's caching behavior, but I've
sinec considered it possible that I'm getting bit instead by a core
mod_perl2 behavior under apache 2.0.x.

So, I'm re-posting here; hopefully someone here has some insight. :)

------

Howdy; I have a weird symptom to report.  Dunno if it's pilot error,
or something odd with my tools.

Hopefully someone out there can offer some advice to clear up what
I'm seeing.  If there's any more data I can provide, I'd be happy
to provide it.

The background: I often use SDBM-tied hashes to share cheap slow
data across concurrent apps.  Not much luck under Mason.

I've cobbled together a Mason handler.pl that maintains such a tied
hash as a global variable.

I've written a component that lets me get/set a value in this tied
hash.

The symptom I see that after a 'set', subsequent 'gets' show me
various results; sometimes the data comes back set, sometimes not.

I wrote a loop with curl to re-hit my component, with a one-second
sleep; the component logs what it sees.

Then, I get my component in from another window, to cause the set.

Here, HASH(foo) is my tied hash reference, followed by the PID of
the apache process handling the request, and a bit representing
whether or not my data is seen as set in the tied hash.  Two complete
loops over all of my http pids:

  mod_info is HASH(0x9e0f0e4) 8846 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8839 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8840 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8841 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8844 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8845 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8838 1 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8847 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8846 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8839 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8840 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8841 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8844 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8845 0 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8838 1 at /var/www/html/error_test line 35.
  mod_info is HASH(0x9e0f0e4) 8847 0 at /var/www/html/error_test line 35.

You can see that the hash ID never alters; seemingly, it's always
the same hash.  The logs make it seem that only one of my httpd
PIDs made the change.

If I perform these same experiments with command-line tools, the
multiple processes show the data being written to the disk within
a second or so, and all of the processes showing that tied hash
reflect the change quickly.

At the end of this message is a simple tool to demonstrate what I
expect, save as 'dbreader'.  Just takes key/value pairs on the
command line, writes to the tied hash, then loops, displaying the
contents of the hash.

In one window, 
  
  ./dbreader apples cherries grapes bananas

And in another,
 
  ./dbreader apples pineapples grapes peaches

I can have as many of these as I want running at the same time, and
it Just Works.  Somehow, running this under apache/mason/mod_perl
violates my expectations.

The tools on my system:

>From RedHat:

  # uname -r
  2.6.9-42.0.10.EL

  # perl -v

  This is perl, v5.8.5 built for i386-linux-thread-multi

  (From Red Hat's perl-5.8.5-36.RHEL4)

From: http://www.openfusion.com.au/mrepo/centos4-i386/

  libapreq2-2.08-1.el4.i686.rpm
  mod_perl-2.0.3-1.of.el4.i686.rpm
  perl-libapreq2-2.08-1.el4.i686.rpm

>From Dries:

  perl-HTML-Mason-1.3101-1.2.el4.rf.noarch.rpm
  (plus depandancies)

#--------------
#!/usr/bin/perl -w

use strict;
use POSIX;
use SDBM_File;

my $dbclass = 'SDBM_File';
my $loc = 'testing';
my %info=();
my $hashref = { @ARGV };
tie (%info, $dbclass, $loc, O_RDWR|O_CREAT, 0640) ||
  die "$0: can't tie $loc: $!";
%info = %{$hashref};
while(1)
{
  print "----\n";
  foreach (sort keys %info) { print "$_: $info{$_}\n"; }
  sleep 1;
}

#--------------


-- 
Brian Reichert				<re...@numachi.com>
55 Crystal Ave. #286			Daytime number: (603) 434-6842
Derry NH 03038-1725 USA			BSD admin/developer at large	

Re: mod_perl2 and SDBM-tied hashes

Posted by Brian Reichert <re...@numachi.com>.
On Sun, Jul 29, 2007 at 01:49:56PM -0700, Perrin Harkins wrote:
> The dbm implementation you're using will not always write everything
> to disk until you untie it.  To make this transparent, you can use
> MLDBM::Sync, which unties and reties on every request.  This is
> necessary for read/write sharing on all dbms except BerkeleyDB.

Interesting; thanks for the pointer....

> - Perrin

-- 
Brian Reichert				<re...@numachi.com>
55 Crystal Ave. #286			Daytime number: (603) 434-6842
Derry NH 03038-1725 USA			BSD admin/developer at large	

Re: mod_perl2 and SDBM-tied hashes

Posted by Perrin Harkins <pe...@elem.com>.
On 7/29/07, Brian Reichert <re...@numachi.com> wrote:
> The symptom I see that after a 'set', subsequent 'gets' show me
> various results; sometimes the data comes back set, sometimes not.

The dbm implementation you're using will not always write everything
to disk until you untie it.  To make this transparent, you can use
MLDBM::Sync, which unties and reties on every request.  This is
necessary for read/write sharing on all dbms except BerkeleyDB.

- Perrin

Re: mod_perl2 and SDBM-tied hashes

Posted by Brian Reichert <re...@numachi.com>.
On Sun, Jul 29, 2007 at 03:42:19PM -0400, Jonathan Vanasco wrote:
> any reason why you're using sdbm ?  you might be better off with bdb,  
> since it has that shared memory cache feature.

My initial experiements worked with SDBM, so I ran with it.  :)  I
suppose I could re-rest with DB_File, if that's what you're referring
to...

> you generally don't want to use worker under modperl, and you  
> generlaly do want to use prefork.

So I had supposed.

> The RedHat RPMs tend to be outdated,  especially for modperl.  You're  
> often best suited building apache and modperl from source.

That I know, but I'm trying to limit the set of RPMs I'm building
interally (we have an internal distribution model that entirely
RPM-based.)

-- 
Brian Reichert				<re...@numachi.com>
55 Crystal Ave. #286			Daytime number: (603) 434-6842
Derry NH 03038-1725 USA			BSD admin/developer at large	

Re: mod_perl2 and SDBM-tied hashes

Posted by Jonathan Vanasco <jv...@2xlp.com>.
On Jul 29, 2007, at 12:15 PM, Brian Reichert wrote:

> But, that contradicts the behavior I see with my command-line tool  
> demo:
> distinct processes with distinct tied hashes can sucessfully share  
> data
> through the sdbm.  :/
any reason why you're using sdbm ?  you might be better off with bdb,  
since it has that shared memory cache feature.


>> It's not the same hash, it's a hash at the same memory location in  
>> each of
>> your processes. If your process is deterministic and the hash is  
>> created
>> either in the apache parent or at the same point after forking,  
>> then it will
>> get the same memory address in each child.
are you sure about that?   i thought they were different in each  
child, and I thought if you access it via copy-on-write it'll move to  
a new space.


> If it's a factor:
>
> RedHat's apache2 RPM defaults to the prefork MPM.  If I try to use
> the worker MPM, I get a 'free(): invalidpointer' error.
you generally don't want to use worker under modperl, and you  
generlaly do want to use prefork.
The RedHat RPMs tend to be outdated,  especially for modperl.  You're  
often best suited building apache and modperl from source.



Re: mod_perl2 and SDBM-tied hashes

Posted by Brian Reichert <re...@numachi.com>.
On Sun, Jul 29, 2007 at 11:46:20AM -0400, Malcolm J Harwood wrote:
> Your data isn't being shared between the processes, so you're only getting the 
> data back if your request happens to hit the same apache process.

But, that contradicts the behavior I see with my command-line tool demo:
distinct processes with distinct tied hashes can sucessfully share data
through the sdbm.  :/

> If I recall correctly, tied hashes aren't written to their data source until 
> they are deleted.

My command-line tool demo shows that the data it written very shortly I
change the value associated with a key.

> If you are using a global under mod_perl, then that isn't 
> until the child terminates - remember mod_perl is a persistent perl 
> environment, your script does not exit and clean up at the end of the 
> request - any globals persist across requests.

This I agree with, and is the behavior I want.

> It's not the same hash, it's a hash at the same memory location in each of 
> your processes. If your process is deterministic and the hash is created 
> either in the apache parent or at the same point after forking, then it will 
> get the same memory address in each child.

Ok, that makes sense; it was a weak interpretation on my part. :)

I wonder if there's a way to derive a memory address from a perl
object, so I can test distinction more thogroughly...

> Does it work under plain mod_perl, without Mason?

I have not implemented yet, as such, I suppose I'll have to wrangle that...

> Remember Mason does it's own perl code generation, and if you aren't careful 
> ir's easy to make unintentional closures

Yup; I still don't know if Mason is a factor...

(Thanks for feedback, BTW...)

If it's a factor:

RedHat's apache2 RPM defaults to the prefork MPM.  If I try to use
the worker MPM, I get a 'free(): invalidpointer' error.

-- 
Brian Reichert				<re...@numachi.com>
55 Crystal Ave. #286			Daytime number: (603) 434-6842
Derry NH 03038-1725 USA			BSD admin/developer at large	

Re: mod_perl2 and SDBM-tied hashes

Posted by Malcolm J Harwood <mj...@liminalflux.net>.
On Sunday 29 July 2007, Brian Reichert wrote:

> The background: I often use SDBM-tied hashes to share cheap slow
> data across concurrent apps.  Not much luck under Mason.
>
> I've cobbled together a Mason handler.pl that maintains such a tied
> hash as a global variable.
>
> I've written a component that lets me get/set a value in this tied
> hash.
>
> The symptom I see that after a 'set', subsequent 'gets' show me
> various results; sometimes the data comes back set, sometimes not.

Your data isn't being shared between the processes, so you're only getting the 
data back if your request happens to hit the same apache process.

If I recall correctly, tied hashes aren't written to their data source until 
they are deleted. If you are using a global under mod_perl, then that isn't 
until the child terminates - remember mod_perl is a persistent perl 
environment, your script does not exit and clean up at the end of the 
request - any globals persist across requests.

> Here, HASH(foo) is my tied hash reference, followed by the PID of
> the apache process handling the request, and a bit representing
> whether or not my data is seen as set in the tied hash.  Two complete
> loops over all of my http pids:

>   mod_info is HASH(0x9e0f0e4) 8846 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8839 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8840 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8841 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8844 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8845 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8838 1 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8847 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8846 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8839 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8840 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8841 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8844 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8845 0 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8838 1 at /var/www/html/error_test line 35.
>   mod_info is HASH(0x9e0f0e4) 8847 0 at /var/www/html/error_test line 35.
>
> You can see that the hash ID never alters; seemingly, it's always
> the same hash.  The logs make it seem that only one of my httpd
> PIDs made the change.

It's not the same hash, it's a hash at the same memory location in each of 
your processes. If your process is deterministic and the hash is created 
either in the apache parent or at the same point after forking, then it will 
get the same memory address in each child.

If you look at the above, process 8838 has the altered data, and none of the 
other processes do.


> If I perform these same experiments with command-line tools, the
> multiple processes show the data being written to the disk within
> a second or so, and all of the processes showing that tied hash
> reflect the change quickly.

> At the end of this message is a simple tool to demonstrate what I
> expect, save as 'dbreader'.  Just takes key/value pairs on the
> command line, writes to the tied hash, then loops, displaying the
> contents of the hash.

Does it work under plain mod_perl, without Mason?
Remember Mason does it's own perl code generation, and if you aren't careful 
ir's easy to make unintentional closures



-- 
"Ever had a long talk with ambassador Delenn, Commander?"
"Yes, from time to time. Why?"
"She and the universe .. seem to have a special relationship."
"Don't we all?"
- Sheridan and Ivanova in Babylon 5:"A Distant Star"