You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Sam Horrocks <sa...@daemoninc.com> on 2001/01/04 13:56:34 UTC

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Sorry for the late reply - I've been out for the holidays.

 > By the way, how are you doing it?  Do you use a mutex routine that works
 > in LIFO fashion?

 Speedycgi uses separate backend processes that run the perl interpreters.
 The frontend processes (the httpd's that are running mod_speedycgi)
 communicate with the backends, sending over the request and getting the output.

 Speedycgi uses some shared memory (an mmap'ed file in /tmp) to keep track
 of the backends and frontends.  This shared memory contains the queue.
 When backends become free, they add themselves at the front of this queue.
 When the frontends need a backend they pull the first one from the front
 of this list.

 > 
 > >  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
 > >  interpreters, it winds up using a lot fewer interpreters to handle the
 > >  same number of requests.
 > 
 > What I was saying is that it doesn't make sense for one to need fewer
 > interpreters than the other to handle the same concurrency.  If you have
 > 10 requests at the same time, you need 10 interpreters.  There's no way
 > speedycgi can do it with fewer, unless it actually makes some of them
 > wait.  That could be happening, due to the fork-on-demand model, although
 > your warmup round (priming the pump) should take care of that.

 What you say would be true if you had 10 processors and could get
 true concurrency.  But on single-cpu systems you usually don't need
 10 unix processes to handle 10 requests concurrently, since they get
 serialized by the kernel anyways.  I'll try to show how mod_perl handles
 10 concurrent requests, and compare that to mod_speedycgi so you can
 see the difference.

 For mod_perl, let's assume we have 10 httpd's, h1 through h10,
 when the 10 concurent requests come in.  h1 has aquired the mutex,
 and h2-h10 are waiting (in order) on the mutex.  Here's how the cpu
 actually runs the processes:

    h1 accepts
    h1 releases the mutex, making h2 runnable
    h1 runs the perl code and produces the results
    h1 waits for the mutex

    h2 accepts
    h2 releases the mutex, making h3 runnable
    h2 runs the perl code and produces the results
    h2 waits for the mutex

    h3 accepts
    ...

 This is pretty straightforward.  Each of h1-h10 run the perl code
 exactly once.  They may not run exactly in this order since a process
 could get pre-empted, or blocked waiting to send data to the client,
 etc.  But regardless, each of the 10 processes will run the perl code
 exactly once.

 Here's the mod_speedycgi example - it too uses httpd's h1-h10, and they
 all take turns running the mod_speedycgi frontend code.  But the backends,
 where the perl code is, don't have to all be run fairly - they use MRU
 instead.  I'll use b1 and b2 to represent 2 speedycgi backend processes,
 already queued up in that order.

 Here's a possible speedycgi scenario:

    h1 accepts
    h1 releases the mutex, making h2 runnable
    h1 sends a request to b1, making b1 runnable

    h2 accepts
    h2 releases the mutex, making h3 runnable
    h2 sends a request to b2, making b2 runnable

    b1 runs the perl code and sends the results to h1, making h1 runnable
    b1 adds itself to the front of the queue

    h3 accepts
    h3 releases the mutex, making h4 runnable
    h3 sends a request to b1, making b1 runnable

    b2 runs the perl code and sends the results to h2, making h2 runnable
    b2 adds itself to the front of the queue

    h1 produces the results it got from b1
    h1 waits for the mutex

    h4 accepts
    h4 releases the mutex, making h5 runnable
    h4 sends a request to b2, making b2 runnable

    b1 runs the perl code and sends the results to h3, making h3 runnable
    b1 adds itself to the front of the queue

    h2 produces the results it got from b2
    h2 waits for the mutex

    h5 accepts
    h5 release the mutex, making h6 runnable
    h5 sends a request to b1, making b1 runnable

    b2 runs the perl code and sends the results to h4, making h4 runnable
    b2 adds itself to the front of the queue

 This may be hard to follow, but hopefully you can see that the 10 httpd's
 just take turns using b1 and b2 over and over.  So, the 10 conncurrent
 requests end up being handled by just two perl backend processes.  Again,
 this is simplified.  If the perl processes get blocked, or pre-empted,
 you'll end up using more of them.  But generally, the LIFO will cause
 SpeedyCGI to sort-of settle into the smallest number of processes needed for
 the task.

 The difference between the two approaches is that the mod_perl
 implementation forces unix to use 10 separate perl processes, while the
 mod_speedycgi implementation sort-of decides on the fly how many
 different processes are needed.

 > >  Please let me know what you think I should change.  So far my
 > >  benchmarks only show one trend, but if you can tell me specifically
 > >  what I'm doing wrong (and it's something reasonable), I'll try it.
 > 
 > Try setting MinSpareServers as low as possible and setting MaxClients to a
 > value that will prevent swapping.  Then set ab for a concurrency equal to
 > your MaxClients setting.

 I previously had set MinSpareServers to 1 - it did help mod_perl get
 to a higher level, but didn't change the overall trend.

 I found that setting MaxClients to 100 stopped the paging.  At concurrency
 level 100, both mod_perl and mod_speedycgi showed similar rates with ab.
 Even at higher levels (300), they were comparable.

 But, to show that the underlying problem is still there, I then changed
 the hello_world script and doubled the amount of un-shared memory.
 And of course the problem then came back for mod_perl, although speedycgi
 continued to work fine.  I think this shows that mod_perl is still
 using quite a bit more memory than speedycgi to provide the same service.

 > >  I believe that with speedycgi you don't have to lower the MaxClients
 > >  setting, because it's able to handle a larger number of clients, at
 > >  least in this test.
 > 
 > Maybe what you're seeing is an ability to handle a larger number of
 > requests (as opposed to clients) because of the performance benefit I
 > mentioned above.
 
 I don't follow.
 
 > I don't know how hard ab tries to make sure you really
 > have n simultaneous clients at any given time.

 I do know that the ab "-c" option does seem to have an effect on the
 tests I've been running.

 > >  In other words, if with mod_perl you had to turn
 > >  away requests, but with mod_speedycgi you did not, that would just
 > >  prove that speedycgi is more scalable.
 > 
 > Are the speedycgi+Apache processes smaller than the mod_perl
 > processes?  If not, the maximum number of concurrent requests you can
 > handle on a given box is going to be the same.

 The size of the httpds running mod_speedycgi, plus the size of speedycgi
 perl processes is significantly smaller than the total size of the httpd's
 running mod_perl.

 The reason for this is that only a handful of perl processes are required by
 speedycgi to handle the same load, whereas mod_perl uses a perl interpreter
 in all of the httpds.

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Perrin Harkins <pe...@primenet.com>.
Hi Sam,

I think we're talking in circles here a bit, and I don't want to
diminish the original point, which I read as "MRU process selection is a
good idea for Perl-based servers."  Your tests showed that this was
true.

Let me just try to explain my reasoning.  I'll define a couple of my
base assumptions, in case you disagree with them.

- Slices of CPU time doled out by the kernel are very small - so small
that processes can be considered concurrent, even though technically
they are handled serially.
- A set of requests can be considered "simultaneous" if they all arrive
and start being handled in a period of time shorter than the time it
takes to service a request.

Operating on these two assumptions, I say that 10 simultaneous requests
will require 10 interpreters to service them.  There's no way to handle
them with fewer, unless you queue up some of the requests and make them
wait.

I also say that if you have a top limit of 10 interpreters on your
machine because of memory constraints, and you're sending in 10
simultaneous requests constantly, all interpreters will be used all the
time.  In that case it makes no difference to the throughput whether you
use MRU or LRU.

>  What you say would be true if you had 10 processors and could get
>  true concurrency.  But on single-cpu systems you usually don't need
>  10 unix processes to handle 10 requests concurrently, since they get
>  serialized by the kernel anyways.

I think the CPU slices are smaller than that.  I don't know much about
process scheduling, so I could be wrong.  I would agree with you if we
were talking about requests that were coming in with more time between
them.  Speedycgi will definitely use fewer interpreters in that case.

>  I found that setting MaxClients to 100 stopped the paging.  At concurrency
>  level 100, both mod_perl and mod_speedycgi showed similar rates with ab.
>  Even at higher levels (300), they were comparable.

That's what I would expect if both systems have a similar limit of how
many interpreters they can fit in RAM at once.  Shared memory would help
here, since it would allow more interpreters to run.

By the way, do you limit the number of SpeedyCGI processes as well?  it
seems like you'd have to, or they'd start swapping too when you throw
too many requests in.

>  But, to show that the underlying problem is still there, I then changed
>  the hello_world script and doubled the amount of un-shared memory.
>  And of course the problem then came back for mod_perl, although speedycgi
>  continued to work fine.  I think this shows that mod_perl is still
>  using quite a bit more memory than speedycgi to provide the same service.

I'm guessing that what happened was you ran mod_perl into swap again. 
You need to adjust MaxClients when your process size changes
significantly.

>  > >  I believe that with speedycgi you don't have to lower the MaxClients
>  > >  setting, because it's able to handle a larger number of clients, at
>  > >  least in this test.
>  >
>  > Maybe what you're seeing is an ability to handle a larger number of
>  > requests (as opposed to clients) because of the performance benefit I
>  > mentioned above.
> 
>  I don't follow.

When not all processes are in use, I think Speedy would handle requests
more quickly, which would allow it to handle n requests in less time
than mod_perl.  Saying it handles more clients implies that the requests
are simultaneous.  I don't think it can handle more simultaneous
requests.

>  > Are the speedycgi+Apache processes smaller than the mod_perl
>  > processes?  If not, the maximum number of concurrent requests you can
>  > handle on a given box is going to be the same.
> 
>  The size of the httpds running mod_speedycgi, plus the size of speedycgi
>  perl processes is significantly smaller than the total size of the httpd's
>  running mod_perl.
> 
>  The reason for this is that only a handful of perl processes are required by
>  speedycgi to handle the same load, whereas mod_perl uses a perl interpreter
>  in all of the httpds.

I think this is true at lower levels, but not when the number of
simultaneous requests gets up to the maximum that the box can handle. 
At that point, it's a question of how many interpreters can fit in
memory.  I would expect the size of one Speedy + one httpd to be about
the same as one mod_perl/httpd when no memory is shared.  With sharing,
you'd be able to run more processes.

- Perrin

Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

Posted by Les Mikesell <le...@home.com>.
----- Original Message -----
From: "Sam Horrocks" <sa...@daemoninc.com>
To: "Perrin Harkins" <pe...@primenet.com>
Cc: "Gunther Birznieks" <gu...@extropia.com>; "mod_perl list"
<mo...@apache.org>; <sp...@newlug.org>
Sent: Thursday, January 04, 2001 6:56 AM
Subject: Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts
that contain un-shared memory


>  >
>  > Are the speedycgi+Apache processes smaller than the mod_perl
>  > processes?  If not, the maximum number of concurrent requests you can
>  > handle on a given box is going to be the same.
>
>  The size of the httpds running mod_speedycgi, plus the size of speedycgi
>  perl processes is significantly smaller than the total size of the httpd's
>  running mod_perl.

That would be true if you only ran one mod_perl'd httpd, but can you
give a better comparison to the usual setup for a busy site where
you run a non-mod_perl lightweight front end and let mod_rewrite
decide what is proxied through to the larger mod_perl'd backend,
letting apache decide how many backends you need to have
running?

>  The reason for this is that only a handful of perl processes are required by
>  speedycgi to handle the same load, whereas mod_perl uses a perl interpreter
>  in all of the httpds.

I always see at least a 10-1 ratio of front-to-back end httpd's when serving
over the internet.   One effect that is difficult to benchmark is that clients
connecting over the internet are often slow and will hold up the process
that is delivering the data even though the processing has been completed.
The proxy approach provides some buffering and allows the backend
to move on more quickly.  Does speedycgi do the same?

      Les Mikesell
        lesmikesell@home.com