You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Sam Horrocks <sa...@daemoninc.com> on 2000/12/21 11:50:28 UTC

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

 > Gunther Birznieks wrote:
 > > Sam just posted this to the speedycgi list just now.
 > [...]
 > > >The underlying problem in mod_perl is that apache likes to spread out
 > > >web requests to as many httpd's, and therefore as many mod_perl interpreters,
 > > >as possible using an LRU selection processes for picking httpd's.
 > 
 > Hmmm... this doesn't sound right.  I've never looked at the code in
 > Apache that does this selection, but I was under the impression that the
 > choice of which process would handle each request was an OS dependent
 > thing, based on some sort of mutex.
 > 
 > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html
 > 
 > Doesn't that appear to be saying that whichever process gets into the
 > mutex first will get the new request?

 I would agree that whichver process gets into the mutex first will get
 the new request.  That's exactly the problem I'm describing.  What you
 are describing here is first-in, first-out behaviour which implies LRU
 behaviour.

 Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
 2 finishes and requests the mutex, then 3 finishes and requests the mutex.
 So when the next three requests come in, they are handled in the same order:
 1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.

 > In my experience running
 > development servers on Linux it always seemed as if the the requests
 > would continue going to the same process until a request came in when
 > that process was already busy.

 No, they don't.  They go round-robin (or LRU as I say it).

 Try this simple test script:

 use CGI;
 my $cgi = CGI->new;
 print $cgi->header();
 print "mypid=$$\n";

 WIth mod_perl you constantly get different pids.  WIth mod_speedycgi you
 usually get the same pid.  THis is a really good way to see the LRU/MRU
 difference that I'm talking about.

 Here's the problem - the mutex in apache is implemented using a lock
 on a file.  It's left up to the kernel to decide which process to give
 that lock to.

 Now, if you're writing a unix kernel and implementing this file locking code,
 what implementation would you use?  Well, this is a general purpose thing -
 you have 100 or so processes all trying to acquire this file lock.  You could
 give out the lock randomly or in some ordered fashion.  If I were writing
 the kernel I would give it out in a round-robin fashion (or the
 least-recently-used process as I referred to it before).  Why?  Because
 otherwise one of those processes may starve waiting for this lock - it may
 never get the lock unless you do it in a fair (round-robin) manner.

 THe kernel doesn't know that all these httpd's are exactly the same.
 The kernel is implementing a general-purpose file-locking scheme and
 it doesn't know whether one process is more important than another.  If
 it's not fair about giving out the lock a very important process might
 starve.

 Take a look at fs/locks.c (I'm looking at linux 2.3.46).  In there is the
 comment:

 /* Insert waiter into blocker's block list.
  * We use a circular list so that processes can be easily woken up in
  * the order they blocked. The documentation doesn't require this but
  * it seems like the reasonable thing to do.
  */
 static void locks_insert_block(struct file_lock *blocker, struct file_lock *waiter)

 > As I understand it, the implementation of "wake-one" scheduling in the
 > 2.4 Linux kernel may affect this as well.  It may then be possible to
 > skip the mutex and use unserialized accept for single socket servers,
 > which will definitely hand process selection over to the kernel.

 If the kernel implemented the queueing for multiple accepts using a LIFO
 instead of a FIFO and apache used this method instead of file locks,
 then that would probably solve it.

 Just found this on the net on this subject:
    http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html
    http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html

 > > >The problem is that at a high concurrency level, mod_perl is using lots
 > > >and lots of different perl-interpreters to handle the requests, each
 > > >with its own un-shared memory.  It's doing this due to its LRU design.
 > > >But with SpeedyCGI's MRU design, only a few speedy_backends are being used
 > > >because as much as possible it tries to use the same interpreter over and
 > > >over and not spread out the requests to lots of different interpreters.
 > > >Mod_perl is using lots of perl-interpreters, while speedycgi is only using
 > > >a few.  mod_perl is requiring that lots of interpreters be in memory in
 > > >order to handle the requests, wherase speedy only requires a small number
 > > >of interpreters to be in memory.
 > 
 > This test - building up unshared memory in each process - is somewhat
 > suspect since in most setups I've seen, there is a very significant
 > amount of memory being shared between mod_perl processes.

 My message and testing concerns un-shared memory only.  If all of your memory
 is shared, then there shouldn't be a problem.

 But a point I'm making is that with mod_perl you have to go to great
 lengths to write your code so as to avoid unshared memory.  My claim is that
 with mod_speedycgi you don't have to concern yourself as much with this.
 You can concentrate more on the application and less on performance tuning.

 > Regardless,
 > the explanation here doesn't make sense to me.  If we assume that each
 > approach is equally fast (as Sam seems to say earlier in his message)
 > then it should take an equal number of speedycgi and mod_perl processes
 > to handle the same concurrency.

 I don't assume that each approach is equally fast under all loads.  They
 were about the same with concurrency level-1, but higher concurrency levels
 they weren't.

 I am saying that since SpeedyCGI uses MRU to allocate requests to perl
 interpreters, it winds up using a lot fewer interpreters to handle the
 same number of requests.

 On a single-CPU system of course at some point all the concurrency has
 to be serialized. mod_speedycgi and mod_perl take different approaches
 before getting to get to that point.  mod_speedycgi tries to use as
 small a number of unix processes as possible, while mod_perl tries to
 use a very large number of unix processes.

 > That leads me to believe that what's really happening here is that
 > Apache is pre-forking a bit over-zealously in response to a sudden surge
 > of traffic from ab, and thus has extra unused processes sitting around
 > waiting, while speedycgi is avoiding this situation by waiting for
 > someone to try and use the processes before forking them (i.e. no
 > pre-forking).  The speedycgi way causes a brief delay while new
 > processes fork, but doesn't waste memory.  Does this sound like a
 > plausible explanation to folks?

 I don't think it's pre-forking.  When I ran my tests I would always run
 them twice, and take the results from the second run.  The first run
 was just to "prime the pump".

 I tried reducing MinSpareSErvers, and this did help mod_perl get a higher
 concurrency number, but it would still run into a wall where speedycgi
 would not.
 
 > This is probably all a moot point on a server with a properly set
 > MaxClients and Apache::SizeLimit that will not go into swap.

 Please let me know what you think I should change.  So far my
 benchmarks only show one trend, but if you can tell me specifically
 what I'm doing wrong (and it's something reasonable), I'll try it.

 I don't think SizeLimit is the answer - my process isn't growing.  It's
 using the same 50k of un-shared memory over and over.

 I believe that with speedycgi you don't have to lower the MaxClients
 setting, because it's able to handle a larger number of clients, at
 least in this test.  In other words, if with mod_perl you had to turn
 away requests, but with mod_speedycgi you did not, that would just
 prove that speedycgi is more scalable.

 Now you could tell me "don't use unshared memory", but that's outside
 the bounds of the test.   The whole test concerns unshared memory.
 
 > I would
 > expect mod_perl to have the advantage when all processes are
 > fully-utilized because of the shared memory.

 Maybe.  There must a benchmark somewhere that would show off of
 mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
 program or something like that - it would have to be something where
 mod_perl is using *lots* of shared memory, because keep in mind that
 there are still going to be a whole lot fewer SpeedyCGI processes than
 there are mod_perl processes, so you would really have to go overboard
 in the shared-memory department.

 > It would be cool if speedycgi could somehow use a parent process
 > model and get the shared memory benefits too.

 > Speedy seems like it
 > might be more attractive to > ISPs, and it would be nice to increase
 > interoperability between the two > projects.

 Thanks.  And please, I'm not trying  start a speedy vs mod_perl war.
 My original message was only to the speedycgi list, but now that it's
 on mod_perl I think I have to reply there too.

 But, there is a need for a little good PR on speedycgi's side, and I
 was looking for that.  I would rather just see mod_perl fixed if that's
 possible.  But the last time I brought up this issue (maybe a year ago)
 I was unable to convince the people on the mod_perl list that this
 problem even existed.

 Sam

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Gunther Birznieks <gu...@extropia.com>.
At 09:16 PM 12/21/00 +0100, Stas Bekman wrote:
[much removed]

>So the moment mod_perl 2.0 hits the shelves, this possible benefit
>of speedycgi over mod_perl becomes irrelevant. I think this more or less
>summarizes this thread.
I think you are right about the summarization. However, I also think it's 
unfair for people here to pin too many hopes on mod_perl 2.0.

First Apache 2.0 has to be fully released. It's still in Alpha! Then, 
mod_perl 2.0 has to be released. I haven't seen any realistic timelines 
that indicate to me that these will be released and stable for production 
use in only a few months time. And Apache 2.0 has been worked on for years. 
I first saw a talk on Apache 2.0's architecture at the first ApacheCon 2 
years ago! To be fair, back then they were using Mozilla's NPR which I 
think they learned from, threw away, and rewrote from scratch after all (to 
become APR). But still, the point is that it's been a long time and 
probably will be a while yet.

Who in their right mind would pin their business or production database on 
the hope that mod_perl 2.0 comes out in a few months? I don't think anyone 
would. Sam has a solution that works now, and is open source and provides 
some benefits for web applications that mod_perl and apache is not as 
efficient at for some types of applications.

As people interested in Perl, we should be embracing these alternatives not 
telling people to wait for new versions of software that may not come out soon.

If there is a problem with mod_perl advocacy, it's that it is precisely too 
mod_perl centric. Mod_perl is a niche crowd which has a high learning 
curve. I think the technology mod_perl offers is great, but as has been 
said before, the problem is that people are going to PHP away from Perl. If 
more people had easier solutions to implement their simple apps in Perl yet 
be as fast as PHP, less people would go to PHP.

Those Perl people would eventually discover mod_perl's power as they 
require it, and then they would take the step to "upgrade" to the power of 
handlers away from the "missing link".

But without that "missing link" to make things easy for people to move from 
PHP to Perl, then Perl will miss something very crucial to maintaining its 
standing as the "defacto language for Web applications".

3 years ago, I think it would be accurate to say Perl apps drive 95% of the 
dynamic web. Sadly, I believe (anecdotally) that this is no longer true.

SpeedyCGI is not "THE" missing link, but I see it as a crucial part of this 
link between newbies and mod_perl. This is why I believe that mod_perl and 
its documentation should have a section (even if tiny) on this stuff, so 
that people will know that if they find mod_perl too hard, that there are 
alternatives that are less powerful, yet provide at least enough power to 
beat PHP.

I also see SpeedyCGI as being on the way to being more ISP-friendly already 
for hosting casual users of Perl than mod_perl is. Different apps use a 
different backend engine by default. So the problem with virtual hosts 
screwing each other over by accident is gone for the casual user. There are 
still some needs for improvement (eg memory is likely still an issue with 
different backends)...

Anyway, these are just my feelings. I really shouldn't be spending time on 
posting this as I have some deadlines to meet. But I felt they were still 
important points to make that I think some people may be potentially 
missing here. :)



Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Stas Bekman <st...@stason.org>.
Folks, your discussion is not short of wrong statements that can be easily
proved, but I don't find it useful. Instead please read:

http://perl.apache.org/~dougm/modperl_2.0.html#new

Too quote the most relevant part:

"With 2.0, mod_perl has much better control over which PerlInterpreters
are used for incoming requests. The intepreters are stored in two linked
lists, one for available interpreters one for busy. When needed to handle
a request, one is taken from the head of the available list and put back
into the head of the list when done. This means if you have, say, 10
interpreters configured to be cloned at startup time, but no more than 5
are ever used concurrently, those 5 continue to reuse Perls allocations,
while the other 5 remain much smaller, but ready to go if the need
arises."

Of course you should read the rest.

So the moment mod_perl 2.0 hits the shelves, this possible benefit
of speedycgi over mod_perl becomes irrelevant. I think this more or less
summarizes this thread.

And Gunther, nobody tries to shut people expressing their opinions here,
it's just that different people express their feelings in different ways,
that's the way the open list goes... :) so please keep on forwarding
things that you find interesting. I don't think anybody here has a relief
when you are busy and not posting as you happen to say -- I believe that
your posts are very interesting and you shouldn't discourage yourself from
keeping on doing that. Those who don't like your posts don't have to read
them.

Hope you are all having fun and getting ready for the holidays :) I'm
going to buy my ski equipment soonish!

_____________________________________________________________________
Stas Bekman              JAm_pH     --   Just Another mod_perl Hacker
http://stason.org/       mod_perl Guide  http://perl.apache.org/guide 
mailto:stas@stason.org   http://apachetoday.com http://logilune.com/
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/  



Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory

Posted by Jeremy Howard <jh...@fastmail.fm>.
Joe Schaefer wrote:
> "Jeremy Howard" <jh...@fastmail.fm> writes:
> > I don't know if Speedy fixes this, but one problem with mod_perl v1 is
that
> > if, for instance, a large POST request is being uploaded, this takes a
whole
> > perl interpreter while the transaction is occurring. This is at least
one
> > place where a Perl interpreter should not be needed.
> >
> > Of course, this could be overcome if an HTTP Accelerator is used that
takes
> > the whole request before passing it to a local httpd, but I don't know
of
> > any proxies that work this way (AFAIK they all pass the packets as they
> > arrive).
>
> I posted a patch to modproxy a few months ago that specifically
> addresses this issue.  It has a ProxyPostMax directive that changes
> it's behavior to a store-and-forward proxy for POST data (it also enabled
> keepalives on the browser-side connection if they were enabled on the
> frontend server.)
>
FYI, this patch is at:

  http://www.mail-archive.com/modperl@apache.org/msg11072.html



Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory

Posted by Gunther Birznieks <gu...@extropia.com>.
At 10:17 PM 12/22/2000 -0500, Joe Schaefer wrote:
>"Jeremy Howard" <jh...@fastmail.fm> writes:
>
>[snipped]
>I posted a patch to modproxy a few months ago that specifically
>addresses this issue.  It has a ProxyPostMax directive that changes
>it's behavior to a store-and-forward proxy for POST data (it also enabled
>keepalives on the browser-side connection if they were enabled on the
>frontend server.)
>
>It does this by buffering the data to a temp file on the proxy before
>opening the backend socket.  It's straightforward to make it buffer to
>a portion of RAM instead- if you're interested I can post another patch
>that does this also, but it's pretty much untested.
Cool! Are these patches now incorporated in the core mod_proxy if we 
download it off the web? Or do we troll through the mailing list to find 
the patch?

(Similar question about the forwarding of remote user patch someone posted 
last year).

Thanks,
     Gunther


Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory

Posted by Joe Schaefer <jo...@sunstarsys.com>.
"Jeremy Howard" <jh...@fastmail.fm> writes:

> Perrin Harkins wrote:
> > What I was saying is that it doesn't make sense for one to need fewer
> > interpreters than the other to handle the same concurrency.  If you have
> > 10 requests at the same time, you need 10 interpreters.  There's no way
> > speedycgi can do it with fewer, unless it actually makes some of them
> > wait.  That could be happening, due to the fork-on-demand model, although
> > your warmup round (priming the pump) should take care of that.

A backend server can realistically handle multiple frontend requests, since
the frontend server must stick around until the data has been delivered
to the client (at least that's my understanding of the lingering-close
issue that was recently discussed at length here). Hypothetically speaking,
if a "FastCGI-like"[1] backend can deliver it's content faster than the 
apache (front-end) server can "proxy" it to the client, you won't need as 
many to handle the same (front-end) traffic load.

As an extreme hypothetical example, say that over a 5 second period you
are barraged with 100 modem requests that typically would take 5s each to 
service.  This means (sans lingerd :) that at the end of your 5 second 
period, you have 100 active apache children around.

But if new requests during that 5 second interval were only received at 
20/second, and your "FastCGI-like" server could deliver the content to
apache in one second, you might only have forked 50-60 "FastCGI-like" new 
processes to handle all 100 requests (forks take a little time :).

Moreover, an MRU design allows the transient effects of a short burst 
of abnormally heavy traffic to dissipate quickly, and IMHO that's its 
chief advantage over LRU.  To return to this hypothetical, suppose 
that immediately following this short burst, we maintain a sustained 
traffic of 20 new requests per second. Since it takes 5 seconds to 
deliver the content, that amounts to a sustained concurrency level 
of 100. The "Fast-CGI like" backend may have initially reacted by forking 
50-60 processes, but with MRU only 20-30 processes will actually be 
handling the load, and this reduction would happen almost immediately 
in this hyothetical.  This means that the remaining transient 20-30 
processes could be quickly killed off or _moved to swap_ without adversely 
affecting server performance.

Again, this is all purely hypothetical - I don't have benchmarks to
back it up ;)

> I don't know if Speedy fixes this, but one problem with mod_perl v1 is that
> if, for instance, a large POST request is being uploaded, this takes a whole
> perl interpreter while the transaction is occurring. This is at least one
> place where a Perl interpreter should not be needed.
> 
> Of course, this could be overcome if an HTTP Accelerator is used that takes
> the whole request before passing it to a local httpd, but I don't know of
> any proxies that work this way (AFAIK they all pass the packets as they
> arrive).

I posted a patch to modproxy a few months ago that specifically 
addresses this issue.  It has a ProxyPostMax directive that changes 
it's behavior to a store-and-forward proxy for POST data (it also enabled 
keepalives on the browser-side connection if they were enabled on the 
frontend server.)

It does this by buffering the data to a temp file on the proxy before 
opening the backend socket.  It's straightforward to make it buffer to 
a portion of RAM instead- if you're interested I can post another patch 
that does this also, but it's pretty much untested.


[1] I've never used SpeedyCGI, so I've refrained from specifically discussing 
    it. Also, a mod_perl backend server using Apache::Registry can be viewed as 
    "FastCGI-like" for the purpose of my argument.

-- 
Joe Schaefer


Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory

Posted by Jeremy Howard <jh...@fastmail.fm>.
Perrin Harkins wrote:
> What I was saying is that it doesn't make sense for one to need fewer
> interpreters than the other to handle the same concurrency.  If you have
> 10 requests at the same time, you need 10 interpreters.  There's no way
> speedycgi can do it with fewer, unless it actually makes some of them
> wait.  That could be happening, due to the fork-on-demand model, although
> your warmup round (priming the pump) should take care of that.
>
I don't know if Speedy fixes this, but one problem with mod_perl v1 is that
if, for instance, a large POST request is being uploaded, this takes a whole
perl interpreter while the transaction is occurring. This is at least one
place where a Perl interpreter should not be needed.

Of course, this could be overcome if an HTTP Accelerator is used that takes
the whole request before passing it to a local httpd, but I don't know of
any proxies that work this way (AFAIK they all pass the packets as they
arrive).



Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscriptsthat contain un-shared memory

Posted by "Keith G. Murphy" <ke...@mindspring.com>.
Perrin Harkins wrote:
> 
> 
> Keith Murphy pointed out that I was seeing the result of persistent HTTP
> connections from my browser.  Duh.
> 
I must mention that, having seen your postings here over a long period,
anytime I can make you say "duh", my week is made.  Maybe the whole
month.

That issue can be confusing.  It was especially so for me when IE did
it, and Netscape did not...

Let's make everyone switch to IE, and mod_perl looks good again!  :-b

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Perrin Harkins <pe...@primenet.com>.
On Thu, 21 Dec 2000, Ken Williams wrote:
> So in a sense, I think you're both correct.  If "concurrency" means
> the number of requests that can be handled at once, both systems are
> necessarily (and trivially) equivalent.  This isn't a very useful
> measurement, though; a more useful one is how many children (or
> perhaps how much memory) will be necessary to handle a given number of
> incoming requests per second, and with this metric the two systems
> could perform differently.

Yes, well put.  And that actually brings me back around to my original
hypothesis, which is that once you reach the maximum number of
interprerters that can be run on the box before swapping, it no longer
makes a difference if you're using LRU or MRU.  That's because all
interpreters are busy all the time, and the RAM for lexicals has already
been allocated in all of them.  At that point, it's a question of which
system can fit more interpreters in RAM at once, and I still think
mod_perl would come out on top there because of the shared memory.  Of
course most people don't run their servers at full throttle, and at less
than total saturation I would expect speedycgi to use less RAM and
possibly be faster.

So I guess I'm saying exactly the opposite of the original assertion:
mod_perl is more scalable if you define "scalable" as maximum requests per
second on a given machine, but speedycgi uses fewer resources at less than
peak loads which would make it more attractive for ISPs and other people
who use their servers for multiple tasks.

This is all hypothetical and I don't have time to experiment with it until
after the holidays, but I think the logic is correct.

- Perrin

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Ken Williams <ke...@forum.swarthmore.edu>.
perrin@primenet.com (Perrin Harkins) wrote:
>Hi Sam,
[snip]
>>  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
>>  interpreters, it winds up using a lot fewer interpreters to handle the
>>  same number of requests.
>
>What I was saying is that it doesn't make sense for one to need fewer
>interpreters than the other to handle the same concurrency.  If you have
>10 requests at the same time, you need 10 interpreters.  There's no way
>speedycgi can do it with fewer, unless it actually makes some of them
>wait.

Well, there is one way, though it's probably not a huge factor.  If
mod_perl indeed manages the child-farming in such a way that too much
memory is used, then each process might slow down as memory becomes
sparse, especially if you start swapping.  Then if each request takes
longer, your child pool is more saturated with requests, and you might
have to fork a few more kids.

So in a sense, I think you're both correct.  If "concurrency" means the
number of requests that can be handled at once, both systems are
necessarily (and trivially) equivalent.  This isn't a very useful
measurement, though; a more useful one is how many children (or perhaps
how much memory) will be necessary to handle a given number of incoming
requests per second, and with this metric the two systems could perform
differently.


  -------------------                            -------------------
  Ken Williams                             Last Bastion of Euclidity
  ken@forum.swarthmore.edu                            The Math Forum

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Perrin Harkins <pe...@primenet.com>.
Hi Sam,

>  Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
>  2 finishes and requests the mutex, then 3 finishes and requests the mutex.
>  So when the next three requests come in, they are handled in the same order:
>  1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.

Thanks for the explanation; that makes sense now.  So, I was right that
it's OS dependent, but most OSes use a FIFO approach which leads to LRU
selection in the mutex.

Unfortunately, I don't see that being fixed very simply, since it's not
really Apache doing the choosing.  Maybe it will be possible to do
something cool with the wake-one stuff in Linux 2.4 when that comes out.

By the way, how are you doing it?  Do you use a mutex routine that works
in LIFO fashion?

>  > In my experience running
>  > development servers on Linux it always seemed as if the the requests
>  > would continue going to the same process until a request came in when
>  > that process was already busy.
> 
>  No, they don't.  They go round-robin (or LRU as I say it).

Keith Murphy pointed out that I was seeing the result of persistent HTTP
connections from my browser.  Duh.

>  But a point I'm making is that with mod_perl you have to go to great
>  lengths to write your code so as to avoid unshared memory.  My claim is that
>  with mod_speedycgi you don't have to concern yourself as much with this.
>  You can concentrate more on the application and less on performance tuning.

I think you're overstating the case a bit here.  It's really easy to take
advantage of shared memory with mod_perl - I just add a 'use Foo' to my
startup.pl!  It can be hard for newbies to understand, but there's nothing
difficult about implementing it.  I often get 50% or more of my
application shared in this way.  That's a huge savings.

>  I don't assume that each approach is equally fast under all loads.  They
>  were about the same with concurrency level-1, but higher concurrency levels
>  they weren't.

Well, certainly not when mod_perl started swapping...

Actually, there is a reason why MRU could lead to better performance (as
opposed to just saving memory): caching of allocated memory.  The first
time Perl sees lexicals it has to allocate memory for them, so if you
re-use the same interpreter you get to skip this step and that should give
some kind of performance benefit.

>  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
>  interpreters, it winds up using a lot fewer interpreters to handle the
>  same number of requests.

What I was saying is that it doesn't make sense for one to need fewer
interpreters than the other to handle the same concurrency.  If you have
10 requests at the same time, you need 10 interpreters.  There's no way
speedycgi can do it with fewer, unless it actually makes some of them
wait.  That could be happening, due to the fork-on-demand model, although
your warmup round (priming the pump) should take care of that.

>  I don't think it's pre-forking.  When I ran my tests I would always run
>  them twice, and take the results from the second run.  The first run
>  was just to "prime the pump".

That seems like it should do it, but I still think you could only have
more processes handling the same concurrency on mod_perl if some of the
mod_perl processes are idle or some of the speedycgi requests are waiting.

>  > This is probably all a moot point on a server with a properly set
>  > MaxClients and Apache::SizeLimit that will not go into swap.
> 
>  Please let me know what you think I should change.  So far my
>  benchmarks only show one trend, but if you can tell me specifically
>  what I'm doing wrong (and it's something reasonable), I'll try it.

Try setting MinSpareServers as low as possible and setting MaxClients to a
value that will prevent swapping.  Then set ab for a concurrency equal to
your MaxClients setting.

>  I believe that with speedycgi you don't have to lower the MaxClients
>  setting, because it's able to handle a larger number of clients, at
>  least in this test.

Maybe what you're seeing is an ability to handle a larger number of
requests (as opposed to clients) because of the performance benefit I
mentioned above.  I don't know how hard ab tries to make sure you really
have n simultaneous clients at any given time.

>  In other words, if with mod_perl you had to turn
>  away requests, but with mod_speedycgi you did not, that would just
>  prove that speedycgi is more scalable.

Are the speedycgi+Apache processes smaller than the mod_perl
processes?  If not, the maximum number of concurrent requests you can
handle on a given box is going to be the same.

>  Maybe.  There must a benchmark somewhere that would show off of
>  mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
>  program or something like that - it would have to be something where
>  mod_perl is using *lots* of shared memory, because keep in mind that
>  there are still going to be a whole lot fewer SpeedyCGI processes than
>  there are mod_perl processes, so you would really have to go overboard
>  in the shared-memory department.

Well, I get tons of use out of shared memory without even trying.  If you
can find a way to implement it in speedycgi, I think it would be very
beneficial to your users.

>  I would rather just see mod_perl fixed if that's
>  possible.

Because this has more to do with the OS than Apache and is already fixed
in mod_perl 2, I doubt anyone will feel like messing with it before that
gets released.  Your experiment demonstrates that the MRU approach has
value, so I'll be looking forward to trying it out with mod_perl 2.

- Perrin

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Gunther Birznieks <gu...@extropia.com>.
I think you could actually make speedycgi even better for shared memory 
usage by creating a special directive which would indicate to speedycgi to 
preload a series of modules. And then to tell speedy cgi to do forking of 
that "master" backend preloaded module process and hand control over to 
that forked process whenever you need to launch a new process.

Then speedy would potentially have the best of both worlds.

Sorry I cross posted your thing. But I do think it is a problem of mod_perl 
also, and I am happily using speedycgi in production on at least one 
commercial site where mod_perl could not be installed so easily because of 
infrastructure issues.

I believe your mechanism of round robining among MRU perl interpreters is 
actually also accomplished by ActiveState's PerlEx (based on 
Apache::Registry but using multithreaded IIS and pool of Interpreters). A 
method similar to this will be used in Apache 2.0 when Apache is 
multithreaded and therefore can control within program logic which Perl 
interpeter gets called from a pool of Perl interpreters.

It just isn't so feasible right now in Apache 1.0 to do this. And sometimes 
people forget that mod_perl came about primarily for writing handlers in 
Perl not as an application environment although it is very good for the 
later as well.

I think SpeedyCGI needs more advocacy from the mod_perl group because put 
simply speedycgi is way easier to set up and use than mod_perl and will 
likely get more PHP people using Perl again. If more people rely on Perl 
for their fast websites, then you will get more people looking for more 
power, and by extension more people using mod_perl.

Whoops... here we go with the advocacy thing again.

Later,
    Gunther

At 02:50 AM 12/21/2000 -0800, Sam Horrocks wrote:
>  > Gunther Birznieks wrote:
>  > > Sam just posted this to the speedycgi list just now.
>  > [...]
>  > > >The underlying problem in mod_perl is that apache likes to spread out
>  > > >web requests to as many httpd's, and therefore as many mod_perl 
> interpreters,
>  > > >as possible using an LRU selection processes for picking httpd's.
>  >
>  > Hmmm... this doesn't sound right.  I've never looked at the code in
>  > Apache that does this selection, but I was under the impression that the
>  > choice of which process would handle each request was an OS dependent
>  > thing, based on some sort of mutex.
>  >
>  > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html
>  >
>  > Doesn't that appear to be saying that whichever process gets into the
>  > mutex first will get the new request?
>
>  I would agree that whichver process gets into the mutex first will get
>  the new request.  That's exactly the problem I'm describing.  What you
>  are describing here is first-in, first-out behaviour which implies LRU
>  behaviour.
>
>  Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
>  2 finishes and requests the mutex, then 3 finishes and requests the mutex.
>  So when the next three requests come in, they are handled in the same order:
>  1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.
>
>  > In my experience running
>  > development servers on Linux it always seemed as if the the requests
>  > would continue going to the same process until a request came in when
>  > that process was already busy.
>
>  No, they don't.  They go round-robin (or LRU as I say it).
>
>  Try this simple test script:
>
>  use CGI;
>  my $cgi = CGI->new;
>  print $cgi->header();
>  print "mypid=$$\n";
>
>  WIth mod_perl you constantly get different pids.  WIth mod_speedycgi you
>  usually get the same pid.  THis is a really good way to see the LRU/MRU
>  difference that I'm talking about.
>
>  Here's the problem - the mutex in apache is implemented using a lock
>  on a file.  It's left up to the kernel to decide which process to give
>  that lock to.
>
>  Now, if you're writing a unix kernel and implementing this file locking 
> code,
>  what implementation would you use?  Well, this is a general purpose thing -
>  you have 100 or so processes all trying to acquire this file lock.  You 
> could
>  give out the lock randomly or in some ordered fashion.  If I were writing
>  the kernel I would give it out in a round-robin fashion (or the
>  least-recently-used process as I referred to it before).  Why?  Because
>  otherwise one of those processes may starve waiting for this lock - it may
>  never get the lock unless you do it in a fair (round-robin) manner.
>
>  THe kernel doesn't know that all these httpd's are exactly the same.
>  The kernel is implementing a general-purpose file-locking scheme and
>  it doesn't know whether one process is more important than another.  If
>  it's not fair about giving out the lock a very important process might
>  starve.
>
>  Take a look at fs/locks.c (I'm looking at linux 2.3.46).  In there is the
>  comment:
>
>  /* Insert waiter into blocker's block list.
>   * We use a circular list so that processes can be easily woken up in
>   * the order they blocked. The documentation doesn't require this but
>   * it seems like the reasonable thing to do.
>   */
>  static void locks_insert_block(struct file_lock *blocker, struct 
> file_lock *waiter)
>
>  > As I understand it, the implementation of "wake-one" scheduling in the
>  > 2.4 Linux kernel may affect this as well.  It may then be possible to
>  > skip the mutex and use unserialized accept for single socket servers,
>  > which will definitely hand process selection over to the kernel.
>
>  If the kernel implemented the queueing for multiple accepts using a LIFO
>  instead of a FIFO and apache used this method instead of file locks,
>  then that would probably solve it.
>
>  Just found this on the net on this subject:
>     http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html
>     http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html
>
>  > > >The problem is that at a high concurrency level, mod_perl is using lots
>  > > >and lots of different perl-interpreters to handle the requests, each
>  > > >with its own un-shared memory.  It's doing this due to its LRU design.
>  > > >But with SpeedyCGI's MRU design, only a few speedy_backends are 
> being used
>  > > >because as much as possible it tries to use the same interpreter 
> over and
>  > > >over and not spread out the requests to lots of different interpreters.
>  > > >Mod_perl is using lots of perl-interpreters, while speedycgi is 
> only using
>  > > >a few.  mod_perl is requiring that lots of interpreters be in memory in
>  > > >order to handle the requests, wherase speedy only requires a small 
> number
>  > > >of interpreters to be in memory.
>  >
>  > This test - building up unshared memory in each process - is somewhat
>  > suspect since in most setups I've seen, there is a very significant
>  > amount of memory being shared between mod_perl processes.
>
>  My message and testing concerns un-shared memory only.  If all of your 
> memory
>  is shared, then there shouldn't be a problem.
>
>  But a point I'm making is that with mod_perl you have to go to great
>  lengths to write your code so as to avoid unshared memory.  My claim is that
>  with mod_speedycgi you don't have to concern yourself as much with this.
>  You can concentrate more on the application and less on performance tuning.
>
>  > Regardless,
>  > the explanation here doesn't make sense to me.  If we assume that each
>  > approach is equally fast (as Sam seems to say earlier in his message)
>  > then it should take an equal number of speedycgi and mod_perl processes
>  > to handle the same concurrency.
>
>  I don't assume that each approach is equally fast under all loads.  They
>  were about the same with concurrency level-1, but higher concurrency levels
>  they weren't.
>
>  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
>  interpreters, it winds up using a lot fewer interpreters to handle the
>  same number of requests.
>
>  On a single-CPU system of course at some point all the concurrency has
>  to be serialized. mod_speedycgi and mod_perl take different approaches
>  before getting to get to that point.  mod_speedycgi tries to use as
>  small a number of unix processes as possible, while mod_perl tries to
>  use a very large number of unix processes.
>
>  > That leads me to believe that what's really happening here is that
>  > Apache is pre-forking a bit over-zealously in response to a sudden surge
>  > of traffic from ab, and thus has extra unused processes sitting around
>  > waiting, while speedycgi is avoiding this situation by waiting for
>  > someone to try and use the processes before forking them (i.e. no
>  > pre-forking).  The speedycgi way causes a brief delay while new
>  > processes fork, but doesn't waste memory.  Does this sound like a
>  > plausible explanation to folks?
>
>  I don't think it's pre-forking.  When I ran my tests I would always run
>  them twice, and take the results from the second run.  The first run
>  was just to "prime the pump".
>
>  I tried reducing MinSpareSErvers, and this did help mod_perl get a higher
>  concurrency number, but it would still run into a wall where speedycgi
>  would not.
>
>  > This is probably all a moot point on a server with a properly set
>  > MaxClients and Apache::SizeLimit that will not go into swap.
>
>  Please let me know what you think I should change.  So far my
>  benchmarks only show one trend, but if you can tell me specifically
>  what I'm doing wrong (and it's something reasonable), I'll try it.
>
>  I don't think SizeLimit is the answer - my process isn't growing.  It's
>  using the same 50k of un-shared memory over and over.
>
>  I believe that with speedycgi you don't have to lower the MaxClients
>  setting, because it's able to handle a larger number of clients, at
>  least in this test.  In other words, if with mod_perl you had to turn
>  away requests, but with mod_speedycgi you did not, that would just
>  prove that speedycgi is more scalable.
>
>  Now you could tell me "don't use unshared memory", but that's outside
>  the bounds of the test.   The whole test concerns unshared memory.
>
>  > I would
>  > expect mod_perl to have the advantage when all processes are
>  > fully-utilized because of the shared memory.
>
>  Maybe.  There must a benchmark somewhere that would show off of
>  mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
>  program or something like that - it would have to be something where
>  mod_perl is using *lots* of shared memory, because keep in mind that
>  there are still going to be a whole lot fewer SpeedyCGI processes than
>  there are mod_perl processes, so you would really have to go overboard
>  in the shared-memory department.
>
>  > It would be cool if speedycgi could somehow use a parent process
>  > model and get the shared memory benefits too.
>
>  > Speedy seems like it
>  > might be more attractive to > ISPs, and it would be nice to increase
>  > interoperability between the two > projects.
>
>  Thanks.  And please, I'm not trying  start a speedy vs mod_perl war.
>  My original message was only to the speedycgi list, but now that it's
>  on mod_perl I think I have to reply there too.
>
>  But, there is a need for a little good PR on speedycgi's side, and I
>  was looking for that.  I would rather just see mod_perl fixed if that's
>  possible.  But the last time I brought up this issue (maybe a year ago)
>  I was unable to convince the people on the mod_perl list that this
>  problem even existed.
>
>  Sam

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
eXtropia - The Web Technology Company
http://www.extropia.com/