You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ian Holsman <ia...@cnet.com> on 2001/11/17 07:22:36 UTC

worker MPM Question

I've been doing some quick performance runs again, and have noticed (well.. other people did before 
me) that the worker MPM tends to concentrate all it's work to a few processes leaving
others doing nothing.

Is this by design?

..Ian

Re: worker MPM Question

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Sat, Nov 17, 2001 at 11:26:15AM -0800, Aaron Bannert wrote:
> On Sat, Nov 17, 2001 at 11:09:54AM -0800, Brian Pane wrote:
> > My hypothesis is that the inactive processes are never able to lock
> > the accept mutex.  If you truss one of the processes that never does
> > anything, what do you see?
> 
> That's a good point, since the accept listener loop is tight, it may
> be starving off the other processes. Solaris is safe from kernel bugs
> associated with the thundering herd, right? Since there are fewer
> processes sitting on that accept mutex, I suggest you experiment with
> simply turning off the mutex entirely (whatever happened to having an
> AcceptMutex none option?). Having 6 processes race out of poll/select and
> into accept may be lighterweight than having the same number contending
> for a crossprocess mutex, especially for a server already getting pummeled
> by incomming requests.

No.  There's a starvation case when you have multiple listeners.
Marc's posted the details on this before.  We must have an accept
lock when we have multiple listeners.  

If we only have one listener, then we can do the thundering herd.
But, not for multiple listeners.  -- justin

Re: worker MPM Question

Posted by Aaron Bannert <aa...@clove.org>.

On Sat, Nov 17, 2001 at 11:09:54AM -0800, Brian Pane wrote:
> My hypothesis is that the inactive processes are never able to lock
> the accept mutex.  If you truss one of the processes that never does
> anything, what do you see?

That's a good point, since the accept listener loop is tight, it may
be starving off the other processes. Solaris is safe from kernel bugs
associated with the thundering herd, right? Since there are fewer
processes sitting on that accept mutex, I suggest you experiment with
simply turning off the mutex entirely (whatever happened to having an
AcceptMutex none option?). Having 6 processes race out of poll/select and
into accept may be lighterweight than having the same number contending
for a crossprocess mutex, especially for a server already getting pummeled
by incomming requests.

-aaron

Re: worker MPM Question

Posted by Brian Pane <bp...@pacbell.net>.

Ian Holsman wrote:

>
> I've been doing some quick performance runs again, and have noticed 
> (well.. other people did before me) that the worker MPM tends to 
> concentrate all it's work to a few processes leaving
> others doing nothing. 

My hypothesis is that the inactive processes are never able to lock
the accept mutex.  If you truss one of the processes that never does
anything, what do you see?

--Brian

Re: worker MPM Question

Posted by Jeff Trawick <tr...@attglobal.net>.

Ian Holsman <ia...@cnet.com> writes:

> I've been doing some quick performance runs again, and have noticed
> (well.. other people did before me) that the worker MPM tends to
> concentrate all it's work to a few processes leaving
> others doing nothing.

Do you have reason to believe that some process is accepting a
connection when it has no worker available to process it, while some
other process with idle workers isn't ever getting the accept mutex?
(I suppose the size of the queue between listener thread and worker
threads is supposed to avoid this; not sure.)

Since all listener threads for processes whose work queue isn't full
are blocked in a mutex, we're at the mercy of the OS for deciding who
gets it (no guarantee of round-robin).

I guess the potential problem on a system with fewer kernel
dispatchable units than threads (M:N) is that a process with
supposedly idle threads and space in the queue will get a connection,
but there won't be a free kernel dispatchable unit immediately.

But if M=N, then what is the problem with idle processes?

Does this make any sense?

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: worker MPM Question

Posted by Aaron Bannert <aa...@clove.org>.

On Fri, Nov 16, 2001 at 10:22:36PM -0800, Ian Holsman wrote:
> 
> I've been doing some quick performance runs again, and have noticed (well.. 
> other people did before me) that the worker MPM tends to concentrate all 
> it's work to a few processes leaving
> others doing nothing.
> 
> Is this by design?

No it's not, but it may be an unfortunate side effect of the current
processing model and how solaris migrates LWPs to other CPUs. You're
probably running on that 6-CPU machine, and given the short-processing
times of the workers before they return to idle it is likely that
solaris (accurately) predicts that it would cost more to migrate to a
new CPU than to just keep it on the same one. How many child processes?
Threads/Process? Attempted req/sec, max that you can get? What is the
application (static page, SSI, other filters/proxy/etc)?

It's also possible that we're still having mutex contention problems,
so let me know if you're still seeing that. I have an account on nagoya
now that I'm going to play with (the 6-cpu monster up in Santa Clara
that runs jakarta.apache.org), so maybe something good will come of that.

I also posted the alternative model patch a while back. I doubt it still
applies cleanly, but it may be worth a try.

If you give me some more details about what kinds of bottlenecks you're
seeing (limit on req/sec, hitting a max number of context switching,
not network/IO limits -- which we had a hard time factoring out before,
IIRC) I might be able to pin down something in the code that's causing it.

-aaron