You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Jeff McCarrell <jm...@akamai.com> on 2010/04/21 02:18:33 UTC

non-stop generational modperl config update strategies?

Hello modperl-folk:

In a nutshell: I would like to ask the community for pointers on how to
evolve our successfully
deployed application which gets restarted once / hour to reload
configuration state
to a model where it is continuously running.

Background:
We have a long-successfully-deployed mp2 application that has to scale well.
Our clusters are currently fronted by hardware load balancers.
To effect a configuration change, which is currently done hourly,
a process removes a cluster slave host from the hardware load balancers
rotation,
pushes the new config state down to the slave,
and restarts apache2.
After apache2 restarts, the mgmt process puts that host back into the load
balancer rotation.

We are running the prefork MPM with 32-512 httpd children on fairly beefy
linux boxes with 4 ­ 8 cores each
using the standard unix copy-on-write model:
load the config state into perl data structures in the parent once via a
PerlPostConfigHandler
then fork as many children as the prefork MPM decides it needs to handle the
load.
The configuration pushes are scheduled via wall clock time;
currently it is deterministic when a configuration change will be fully
propagated.
All configuration changes are pushed to all slaves.
There are currently a few hundred slaves covering most of North America.
This model has worked and is working fairly well for us to date.

However, the clock-work nature of our scheme ultimately limits our
scalability.
And in thinking about what it would take to overcome our current
limitations,
I would like to be able to reload the configuration state from new data
without an apache restart.
Put another way, I would like to be able to load next generation
configuration state into new httpd children,
then kill off the previous generation as they complete their current
requests,
and start using the new instances, all while servicing requests,
albeit at perhaps a reduced rate while the configuration state is being
swapped.
 
I don¹t know how to do this in the prefork MPM;
and what I am proposing more-or-less breaks the load-parent / fork model.

The size of the configuration data are not too large: lets say 10Meg or so,
but large enough that I prefer models that share this state among all httpd
children.

At this point, I'd be willing to consider other MPMs if they help me get
there.
I am willing to trade memory and CPU to achieve a non-stop apache instance.

So modperl experts: any pointers on prior art here?
I'd love to hear about strategies that have been shown to work in real life.


TIA,

-- jeff


Re: non-stop generational modperl config update strategies?

Posted by Perrin Harkins <ph...@gmail.com>.
Hi Jeff,

On Tue, Apr 20, 2010 at 8:18 PM, Jeff McCarrell <jm...@akamai.com> wrote:
> I would like to be able to reload the configuration state from new data
> without an apache restart.

Given that you say it's only 10MBs, my best advice to you is to stop
worrying about sharing, even it means buying a few thousand dollars
more RAM.  It's the simplest thing by far and might save you a lot of
debugging time.  Then you can just have child processes check the
timestamp on your config file or however you push it out and reload
the data as needed in a cleanup handler.

If you're determined to share it, I don't think you'll find anything
significantly better than what you have.  There's no way to share
configuration via IPC without eventually copying it into local perl
memory.  If you try to use the threaded MPM you may succeed in sharing
this structure but you will have totally lost copy-on-write and will
probably end up using more RAM in the end.  It's not that hard to try
though, so maybe you'd like to give it a shot.

Alternatively, you may decide this data is not all needed all the time
and you can store it in something very fast like BerkeleyDB and only
read the pieces you need when you need them from the apache children.

- Perrin

Re: non-stop generational modperl config update strategies?

Posted by Jeff McCarrell <jm...@akamai.com>.
Thanks to Perrin and Torsten for their input.

I will be investigating Torsten's MMapDB, and will report results in a few
weeks.

-- jeff


Re: non-stop generational modperl config update strategies?

Posted by Torsten Förtsch <to...@gmx.net>.
On Wednesday 21 April 2010 17:19:22 Perrin Harkins wrote:
> 2010/4/21 Torsten Förtsch <to...@gmx.net>:
> > no, MMapDB creates read-only variables that reference the mmapped block.
> > It manipulates SvPVX directly
> 
> Very interesting!  I'll have to try it out as a storage backend for CHI.

One really cool thing I have just recently thought about would be an interface 
to get the file offsets of a value. Then one could create file buckets with 
$r->sendfile($filename, $offset, $len);. Even better would be a perl interface 
to ap_send_fd to circumvent the filename=>filedescriptor race condition.

It itches me to implement that but ENOTIME.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net

Re: non-stop generational modperl config update strategies?

Posted by Cosimo Streppone <co...@streppone.it>.
On Wed, 21 Apr 2010 17:10:00 +0200, Torsten Förtsch  
<to...@gmx.net> wrote:

> On Wednesday 21 April 2010 16:59:01 Perrin Harkins wrote:
>> In both cases you have the same drawback: it's impossible to read
>> anything from the shared data without copying the data you read into
>> perl variables.
>> [...]
>>
> no, MMapDB creates read-only variables that reference the mmapped block.  
> It manipulates SvPVX directly:
>
> SvPV_set(sv, pointer);
> SvLEN_set(sv, 0);         # this makes sure perl won't try to free() the  
> space
> [...]
> You can then pass around references to that variable and nothing will be
> copied.

Cool!

So, if I understand correctly: using something like Cache::FastMmap
creates copy of your strings/values/... in your process memory.
See the "fc_read" function. Is this correct?

http://cpansearch.perl.org/src/ROBM/Cache-FastMmap-1.35/Cache-FastMmap-CImpl/CImpl.xs

I guess pretty much anything else works that way, not just Cache::FastMmap.

-- 
Cosimo

Re: non-stop generational modperl config update strategies?

Posted by Perrin Harkins <ph...@gmail.com>.
2010/4/21 Torsten Förtsch <to...@gmx.net>:
> no, MMapDB creates read-only variables that reference the mmapped block. It
> manipulates SvPVX directly

Very interesting!  I'll have to try it out as a storage backend for CHI.

- Perrin

Re: non-stop generational modperl config update strategies?

Posted by Torsten Förtsch <to...@gmx.net>.
On Wednesday 21 April 2010 16:59:01 Perrin Harkins wrote:
> In both cases you have the same drawback: it's impossible to read
> anything from the shared data without copying the data you read into
> perl variables.  A shared database only saves memory if you don't need
> all of the data to handle a request.
> 
no, MMapDB creates read-only variables that reference the mmapped block. It 
manipulates SvPVX directly:

sv=newSV(0);
SvUPGRADE(sv, SVt_PV);
SvPOK_only(sv);
SvPV_set(sv, pointer);
SvLEN_set(sv, 0);         # this makes sure perl won't try to free() the space
SvCUR_set(sv, length);
SvREADONLY_on(sv);

You can then pass around references to that variable and nothing will be 
copied.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net

Re: non-stop generational modperl config update strategies?

Posted by Perrin Harkins <ph...@gmail.com>.
2010/4/21 Torsten Förtsch <to...@gmx.net>:
> The mmapdb databases (single files) can be prepared off-line and then be
> copied to the destination. Then the old database (still mapped by its users)
> is invalidated by setting a flag. At the time of the next access the new
> version will be mmapped. MMapDB is completely lock-free. So, deadlocks as with
> BerkeleyDB cannot occur.

Actually, this is the same approach I would use with BerkekelyDB:
build the database offline, watch for a newer file in a directory,
switch to the new one when it arrives.  This is a read-only
application, so there's no need for locks at all.

In both cases you have the same drawback: it's impossible to read
anything from the shared data without copying the data you read into
perl variables.  A shared database only saves memory if you don't need
all of the data to handle a request.

- Perrin

Re: non-stop generational modperl config update strategies?

Posted by Torsten Förtsch <to...@gmx.net>.
On Wednesday 21 April 2010 02:18:33 Jeff McCarrell wrote:
> I am willing to trade memory and CPU to achieve a non-stop apache instance.
> 
> So modperl experts: any pointers on prior art here?
> I'd love to hear about strategies that have been shown to work in real
>  life.
> 
I use Apache2::Translation (AT) + MMapDB. Of course it depends upon your app. 
AT can configure anything that can be configured at runtime. MMapDB stores 
data in shared mem. As long as you don't need to reload perl modules, create 
new IP-based VHosts, open additional logfiles or similar you probably don't 
need to restart.

The mmapdb databases (single files) can be prepared off-line and then be 
copied to the destination. Then the old database (still mapped by its users)
is invalidated by setting a flag. At the time of the next access the new 
version will be mmapped. MMapDB is completely lock-free. So, deadlocks as with 
BerkeleyDB cannot occur. It even saves the UTF8 bit of strings.

Torsten Förtsch

-- 
Need professional modperl support? Hire me! (http://foertsch.name)

Like fantasy? http://kabatinte.net