You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Gunther Birznieks <gu...@extropia.com> on 2000/12/21 12:38:45 UTC
Re: Fwd: [speedycgi] Speedycgi scales better than mod_perl withscripts that contain un-shared memory

I think you could actually make speedycgi even better for shared memory 
usage by creating a special directive which would indicate to speedycgi to 
preload a series of modules. And then to tell speedy cgi to do forking of 
that "master" backend preloaded module process and hand control over to 
that forked process whenever you need to launch a new process.

Then speedy would potentially have the best of both worlds.

Sorry I cross posted your thing. But I do think it is a problem of mod_perl 
also, and I am happily using speedycgi in production on at least one 
commercial site where mod_perl could not be installed so easily because of 
infrastructure issues.

I believe your mechanism of round robining among MRU perl interpreters is 
actually also accomplished by ActiveState's PerlEx (based on 
Apache::Registry but using multithreaded IIS and pool of Interpreters). A 
method similar to this will be used in Apache 2.0 when Apache is 
multithreaded and therefore can control within program logic which Perl 
interpeter gets called from a pool of Perl interpreters.

It just isn't so feasible right now in Apache 1.0 to do this. And sometimes 
people forget that mod_perl came about primarily for writing handlers in 
Perl not as an application environment although it is very good for the 
later as well.

I think SpeedyCGI needs more advocacy from the mod_perl group because put 
simply speedycgi is way easier to set up and use than mod_perl and will 
likely get more PHP people using Perl again. If more people rely on Perl 
for their fast websites, then you will get more people looking for more 
power, and by extension more people using mod_perl.

Whoops... here we go with the advocacy thing again.

Later,
    Gunther

At 02:50 AM 12/21/2000 -0800, Sam Horrocks wrote:
>  > Gunther Birznieks wrote:
>  > > Sam just posted this to the speedycgi list just now.
>  > [...]
>  > > >The underlying problem in mod_perl is that apache likes to spread out
>  > > >web requests to as many httpd's, and therefore as many mod_perl 
> interpreters,
>  > > >as possible using an LRU selection processes for picking httpd's.
>  >
>  > Hmmm... this doesn't sound right.  I've never looked at the code in
>  > Apache that does this selection, but I was under the impression that the
>  > choice of which process would handle each request was an OS dependent
>  > thing, based on some sort of mutex.
>  >
>  > Take a look at this: http://httpd.apache.org/docs/misc/perf-tuning.html
>  >
>  > Doesn't that appear to be saying that whichever process gets into the
>  > mutex first will get the new request?
>
>  I would agree that whichver process gets into the mutex first will get
>  the new request.  That's exactly the problem I'm describing.  What you
>  are describing here is first-in, first-out behaviour which implies LRU
>  behaviour.
>
>  Processes 1, 2, 3 are running.  1 finishes and requests the mutex, then
>  2 finishes and requests the mutex, then 3 finishes and requests the mutex.
>  So when the next three requests come in, they are handled in the same order:
>  1, then 2, then 3 - this is FIFO or LRU.  This is bad for performance.
>
>  > In my experience running
>  > development servers on Linux it always seemed as if the the requests
>  > would continue going to the same process until a request came in when
>  > that process was already busy.
>
>  No, they don't.  They go round-robin (or LRU as I say it).
>
>  Try this simple test script:
>
>  use CGI;
>  my $cgi = CGI->new;
>  print $cgi->header();
>  print "mypid=$$\n";
>
>  WIth mod_perl you constantly get different pids.  WIth mod_speedycgi you
>  usually get the same pid.  THis is a really good way to see the LRU/MRU
>  difference that I'm talking about.
>
>  Here's the problem - the mutex in apache is implemented using a lock
>  on a file.  It's left up to the kernel to decide which process to give
>  that lock to.
>
>  Now, if you're writing a unix kernel and implementing this file locking 
> code,
>  what implementation would you use?  Well, this is a general purpose thing -
>  you have 100 or so processes all trying to acquire this file lock.  You 
> could
>  give out the lock randomly or in some ordered fashion.  If I were writing
>  the kernel I would give it out in a round-robin fashion (or the
>  least-recently-used process as I referred to it before).  Why?  Because
>  otherwise one of those processes may starve waiting for this lock - it may
>  never get the lock unless you do it in a fair (round-robin) manner.
>
>  THe kernel doesn't know that all these httpd's are exactly the same.
>  The kernel is implementing a general-purpose file-locking scheme and
>  it doesn't know whether one process is more important than another.  If
>  it's not fair about giving out the lock a very important process might
>  starve.
>
>  Take a look at fs/locks.c (I'm looking at linux 2.3.46).  In there is the
>  comment:
>
>  /* Insert waiter into blocker's block list.
>   * We use a circular list so that processes can be easily woken up in
>   * the order they blocked. The documentation doesn't require this but
>   * it seems like the reasonable thing to do.
>   */
>  static void locks_insert_block(struct file_lock *blocker, struct 
> file_lock *waiter)
>
>  > As I understand it, the implementation of "wake-one" scheduling in the
>  > 2.4 Linux kernel may affect this as well.  It may then be possible to
>  > skip the mutex and use unserialized accept for single socket servers,
>  > which will definitely hand process selection over to the kernel.
>
>  If the kernel implemented the queueing for multiple accepts using a LIFO
>  instead of a FIFO and apache used this method instead of file locks,
>  then that would probably solve it.
>
>  Just found this on the net on this subject:
>     http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0455.html
>     http://www.uwsg.iu.edu/hypermail/linux/kernel/9704.0/0453.html
>
>  > > >The problem is that at a high concurrency level, mod_perl is using lots
>  > > >and lots of different perl-interpreters to handle the requests, each
>  > > >with its own un-shared memory.  It's doing this due to its LRU design.
>  > > >But with SpeedyCGI's MRU design, only a few speedy_backends are 
> being used
>  > > >because as much as possible it tries to use the same interpreter 
> over and
>  > > >over and not spread out the requests to lots of different interpreters.
>  > > >Mod_perl is using lots of perl-interpreters, while speedycgi is 
> only using
>  > > >a few.  mod_perl is requiring that lots of interpreters be in memory in
>  > > >order to handle the requests, wherase speedy only requires a small 
> number
>  > > >of interpreters to be in memory.
>  >
>  > This test - building up unshared memory in each process - is somewhat
>  > suspect since in most setups I've seen, there is a very significant
>  > amount of memory being shared between mod_perl processes.
>
>  My message and testing concerns un-shared memory only.  If all of your 
> memory
>  is shared, then there shouldn't be a problem.
>
>  But a point I'm making is that with mod_perl you have to go to great
>  lengths to write your code so as to avoid unshared memory.  My claim is that
>  with mod_speedycgi you don't have to concern yourself as much with this.
>  You can concentrate more on the application and less on performance tuning.
>
>  > Regardless,
>  > the explanation here doesn't make sense to me.  If we assume that each
>  > approach is equally fast (as Sam seems to say earlier in his message)
>  > then it should take an equal number of speedycgi and mod_perl processes
>  > to handle the same concurrency.
>
>  I don't assume that each approach is equally fast under all loads.  They
>  were about the same with concurrency level-1, but higher concurrency levels
>  they weren't.
>
>  I am saying that since SpeedyCGI uses MRU to allocate requests to perl
>  interpreters, it winds up using a lot fewer interpreters to handle the
>  same number of requests.
>
>  On a single-CPU system of course at some point all the concurrency has
>  to be serialized. mod_speedycgi and mod_perl take different approaches
>  before getting to get to that point.  mod_speedycgi tries to use as
>  small a number of unix processes as possible, while mod_perl tries to
>  use a very large number of unix processes.
>
>  > That leads me to believe that what's really happening here is that
>  > Apache is pre-forking a bit over-zealously in response to a sudden surge
>  > of traffic from ab, and thus has extra unused processes sitting around
>  > waiting, while speedycgi is avoiding this situation by waiting for
>  > someone to try and use the processes before forking them (i.e. no
>  > pre-forking).  The speedycgi way causes a brief delay while new
>  > processes fork, but doesn't waste memory.  Does this sound like a
>  > plausible explanation to folks?
>
>  I don't think it's pre-forking.  When I ran my tests I would always run
>  them twice, and take the results from the second run.  The first run
>  was just to "prime the pump".
>
>  I tried reducing MinSpareSErvers, and this did help mod_perl get a higher
>  concurrency number, but it would still run into a wall where speedycgi
>  would not.
>
>  > This is probably all a moot point on a server with a properly set
>  > MaxClients and Apache::SizeLimit that will not go into swap.
>
>  Please let me know what you think I should change.  So far my
>  benchmarks only show one trend, but if you can tell me specifically
>  what I'm doing wrong (and it's something reasonable), I'll try it.
>
>  I don't think SizeLimit is the answer - my process isn't growing.  It's
>  using the same 50k of un-shared memory over and over.
>
>  I believe that with speedycgi you don't have to lower the MaxClients
>  setting, because it's able to handle a larger number of clients, at
>  least in this test.  In other words, if with mod_perl you had to turn
>  away requests, but with mod_speedycgi you did not, that would just
>  prove that speedycgi is more scalable.
>
>  Now you could tell me "don't use unshared memory", but that's outside
>  the bounds of the test.   The whole test concerns unshared memory.
>
>  > I would
>  > expect mod_perl to have the advantage when all processes are
>  > fully-utilized because of the shared memory.
>
>  Maybe.  There must a benchmark somewhere that would show off of
>  mod_perl's advantages in shared memory.  Maybe a 100,000 line perl
>  program or something like that - it would have to be something where
>  mod_perl is using *lots* of shared memory, because keep in mind that
>  there are still going to be a whole lot fewer SpeedyCGI processes than
>  there are mod_perl processes, so you would really have to go overboard
>  in the shared-memory department.
>
>  > It would be cool if speedycgi could somehow use a parent process
>  > model and get the shared memory benefits too.
>
>  > Speedy seems like it
>  > might be more attractive to > ISPs, and it would be nice to increase
>  > interoperability between the two > projects.
>
>  Thanks.  And please, I'm not trying  start a speedy vs mod_perl war.
>  My original message was only to the speedycgi list, but now that it's
>  on mod_perl I think I have to reply there too.
>
>  But, there is a need for a little good PR on speedycgi's side, and I
>  was looking for that.  I would rather just see mod_perl fixed if that's
>  possible.  But the last time I brought up this issue (maybe a year ago)
>  I was unable to convince the people on the mod_perl list that this
>  problem even existed.
>
>  Sam

__________________________________________________
Gunther Birznieks (gunther.birznieks@extropia.com)
eXtropia - The Web Technology Company
http://www.extropia.com/