You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Manoj Kasichainula <ma...@raleigh.ibm.com> on 1999/06/12 00:24:11 UTC

Management of thread pools

There are a few ways I can think of so far if we want to allow each
child in the hybrid server to create and destroy threads on the fly:

1. Use the existing server-wide child shutdown pipe to also send
thread-create/thread-destroy messages. This will keep the level of
infrastrucuture in parent-child communication down, but for the
SINGLE_LISTEN_UNSERIALIZED_ACCEPT case, every message on this pipe
wakes up every thread.

2. Create a pipe from the parent to each child. This will give
us fine control of how many threads each child had, but it is an
added pipe that has to be watched by each process, and (this is an
incredibly minor thing) we've limited our number of children to the
number of file descriptors allowed per-process.

3. Let each child manage its own thread pool, based on how busy it is,
without any guidance from the parent . This appeals to me, but I
shudder at what the behavior of the server might be with this change.

-- 
Manoj Kasichainula - manojk@raleigh.ibm.com
IBM, Apache Development

Re: 2.0 Scoreboard (was Re: Management of thread pools)

Posted by Manoj Kasichainula <ma...@io.com>.

On Sat, Jun 19, 1999 at 02:08:21AM -0700, Dean Gaudet wrote:
> The thing I'm wondering about is how well the requests will balance
> across the processes... that should be an interesting problem. 

With one thread per process accepting connections, and cross-process
serialization, requests tended to cluster in a single process in
Ryan's and my testing. Once we switched to the poll-accept model, we
had to add intraprocess mutexes. Then, it seemed that requests
clustered less, but it seemed to vary based on speed of incoming
connections. I have a feeling you'll be trying to eliminate mutexes
completely; I haven't tested the behavior in that situation yet.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Tandems are good if you need hardware which sucks reliably, 24x365."
  -- Malcolm Ray.

Re: 2.0 Scoreboard (was Re: Management of thread pools)

Posted by Dean Gaudet <dg...@arctic.org>.

On Fri, 18 Jun 1999, Manoj Kasichainula wrote:

> On Fri, Jun 18, 1999 at 05:28:58PM -0700, Dean Gaudet wrote:
> > A similar trick can be done for processes... every second the parent wakes
> > up and does a non-blocking select() on the listening fd, if it's marked
> > ready for listen, then all the children are busy, so spawn another child. 
> 
> Hmmmm. That's a cool idea. This only lets us set a spare thread limit
> of 1, though. Is that enough padding?

Just do exponential spawning... you still need a limiter though --
something to bring things back down...

ah that's easy actually, just timeout the select/accept loops in the
children, if they don't accept anything in N seconds then let them die
off.

Bet that'd do it.

> > How do you shrink the number of children?  Dunno, don't have a trick for
> > that yet. 
> > 
> > The 1.x scoreboard is a multicpu nightmare -- it causes cache lines to be
> > whacked around from CPU to CPU... it just doesn't scale well. 
> 
> Don't most decent MP systems snoop the bus to pick up writes to RAM?

Well we don't align things on cache lines, so it's highly possible that a
scoreboard entry straddles a cache line and so two processors end up
fighting for it... even though it all works right, this isn't optimal
because it makes those mem accesses expensive. 

> Another way to eliminate the idle counting necessity of the scoreboard
> is to have a fixed number of processes, and to make each process run
> its own thread pool. Each child would just add and delete threads
> without changing the number of processesi or caring at all about what
> the other processes are up to. This might cause thrashing, though,
> with threads just dying in one process and starting in another.

I'm totally planning on having a fixed number of processes in the mpm I'm
working on... I don't think it makes sense to vary in two dimensions. 
I'll fix the number of processes and vary the number of threads.  The
thing I'm wondering about is how well the requests will balance across the
processes... that should be an interesting problem. 

Dean

Re: 2.0 Scoreboard (was Re: Management of thread pools)

Posted by Manoj Kasichainula <ma...@io.com>.

On Fri, Jun 18, 1999 at 05:28:58PM -0700, Dean Gaudet wrote:
> I showed a method whereby which one process can maintain a pool of threads
> large enough to service the current demand without a scoreboard... all it
> takes is a simple counter that goes along with the request queue.

Right, I'm not worried about that part.

> A similar trick can be done for processes... every second the parent wakes
> up and does a non-blocking select() on the listening fd, if it's marked
> ready for listen, then all the children are busy, so spawn another child. 

Hmmmm. That's a cool idea. This only lets us set a spare thread limit
of 1, though. Is that enough padding?

> How do you shrink the number of children?  Dunno, don't have a trick for
> that yet. 
> 
> The 1.x scoreboard is a multicpu nightmare -- it causes cache lines to be
> whacked around from CPU to CPU... it just doesn't scale well. 

Don't most decent MP systems snoop the bus to pick up writes to RAM?

For shrinking the number of CPUs, we could check the scoreboard
rarely, say once per minute or 5 minutes. That's all we really need
for the shrinking case. If we give each process its own cache
line (can we determine cache line length programmatically?), and the
parent process checks the scoreboard rarely, the problem you describe
should be virtually nonexistant, right? Well, it'll do until we come
up with something better

> It's important to separate the two uses of the scoreboard.

I guess the purpose of my message didn't quite come through. :)

Another way to eliminate the idle counting necessity of the scoreboard
is to have a fixed number of processes, and to make each process run
its own thread pool. Each child would just add and delete threads
without changing the number of processesi or caring at all about what
the other processes are up to. This might cause thrashing, though,
with threads just dying in one process and starting in another.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"Cheese is a useful thing, too. Should WebDAV have cheese?"
  -- Larry Masinter

Re: 2.0 Scoreboard (was Re: Management of thread pools)

Posted by Dean Gaudet <dg...@arctic.org>.

On Fri, 18 Jun 1999, Manoj Kasichainula wrote:

> We've talked about tearing a lot of the stuff out of the current
> scoreboard and putting it into some sort of query function. If we do
> this, the best things to put in the scoreboard are stats needed to
> manage the server pool, because we don't want to require the parent to
> query every child every second (right?)

Did you take a look at http_event.c in the select/pthread hybrid I posted? 
I showed a method whereby which one process can maintain a pool of threads
large enough to service the current demand without a scoreboard... all it
takes is a simple counter that goes along with the request queue.

A similar trick can be done for processes... every second the parent wakes
up and does a non-blocking select() on the listening fd, if it's marked
ready for listen, then all the children are busy, so spawn another child. 
How do you shrink the number of children?  Dunno, don't have a trick for
that yet. 

The 1.x scoreboard is a multicpu nightmare -- it causes cache lines to be
whacked around from CPU to CPU... it just doesn't scale well. 

> Another option would be a less major tweak. The scoreboard would keep
> per-thread stats instead of per-request stats. Then, each thread can
> interact with the scoreboard as it does now, with no synchronization.
> The scoreboard wouldn't be humongous in the async request handler
> model Dean's doing, but it wouldn't be small either.

The scoreboard shows requests in progress, well, actually, connections in
progress (gotta stop confusing the two).  The proper index is (pid_t,
conn_rec *) ... threads don't enter into the picture really. 

I would be totally happy if getting the "scoreboard contents", the stats
in essence, was something similarly expensive as running "ps".  It's
really the same operation.  And the MPM has to be involved -- because it
has to communicate with any other processes around.

This may even be as complex having a global conn_rec * list, a mutex for
it, and a pipe to each process through which any process can slurp up the
data to display for mod_status.  I don't mind complexity that's invoked
only on every hit to mod_status... I do mind multi-cpu shared data that's
tweaked all the time...

It's important to separate the two uses of the scoreboard.  I mean,
currently we count the child states to figure out #idle, and such.  If we
had atomic increment and decrement we wouldn't need to count... and for
the purpose of maintaining the pool of processes/threads/whatever all you
need is #idle, no other stat.  Everything else the scoreboard does is for
human consumption (and slows things down :)

Dean

2.0 Scoreboard (was Re: Management of thread pools)

Posted by Manoj Kasichainula <ma...@io.com>.

On Mon, Jun 14, 1999 at 11:36:22AM -0400, Ryan Bloom wrote:
> 
> This idea of having a dynamic thread-pool still seems like a bad idea to
> me.  I have no problem if it can be done well, but I don't think Unix
> threads are up to that task yet.

The idea is just to try it out; if it doesn't work, it doesn't work.

The reason I asked the question in the first place is that it will be
important for deciding how a rearchitected scoreboard will work.

We've talked about tearing a lot of the stuff out of the current
scoreboard and putting it into some sort of query function. If we do
this, the best things to put in the scoreboard are stats needed to
manage the server pool, because we don't want to require the parent to
query every child every second (right?)

So, we need to decide what these stats are. And this depends on what
the proper way to manage the threads and processes is.

If we have the parent decide globally what to start and stop, then we
have to give it info on how busy the processes are. But, if the server
doesn't keep per-thread stats, then every thread will have to
serialize on touching a process-busy state field in the scoreboard.

That's the benefit of having each process manage its own thread-pool;
there's less scoreboard activity, and less cross-process serialization
overhead.

Another option would be a less major tweak. The scoreboard would keep
per-thread stats instead of per-request stats. Then, each thread can
interact with the scoreboard as it does now, with no synchronization.
The scoreboard wouldn't be humongous in the async request handler
model Dean's doing, but it wouldn't be small either.

-- 
Manoj Kasichainula - manojk at io dot com - http://www.io.com/~manojk/
"'Why do you blow on people?' I don't know." -- Benny Hinn

Re: Management of thread pools

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.

This idea of having a dynamic thread-pool still seems like a bad idea to
me.  I have no problem if it can be done well, but I don't think Unix
threads are up to that task yet.

When we first started this project, we had one thread which started all
the other threads, and then went away.  We did this, because it made
signal handling MUCH easier (It removed all of the signal hadnling holes,
we still have at least one).  This thread_starter thread was taken out of
the server, because while it was in the server we had a HUGE leak of some
sort.  I believe that the problem was if the thread_starter thread was
used, it was possible that our server process would start to chew up 10
Megs of memory.

This unexplained 10 Meg problem was never actually nailed down and fixed,
but removing the thread_starter thread has made the problem go away.  Now,
I don't know if there was a bug in the Apache code, or the thread library
code.  What I do know, is that there was a bug somewhere.

We should not be investing great amounts of time in this, if having a
dynamic thread pool, is going to cause big problems in the future.  I
will also note that the thread_starter thread used to take care of
exiting by itself, it was not canceled.  And this problem was only seen to
the best of my knowledge on Linux.

I may be very wrong about this, and I haven't done any digging around in
that code to find the bug, but it really seems like the obvious place to
start before we try to do the same thing again.  :)

Just MHO,

Ryan

On Fri, 11 Jun 1999, Manoj Kasichainula wrote:

> There are a few ways I can think of so far if we want to allow each
> child in the hybrid server to create and destroy threads on the fly:
> 
> 1. Use the existing server-wide child shutdown pipe to also send
> thread-create/thread-destroy messages. This will keep the level of
> infrastrucuture in parent-child communication down, but for the
> SINGLE_LISTEN_UNSERIALIZED_ACCEPT case, every message on this pipe
> wakes up every thread.
> 
> 2. Create a pipe from the parent to each child. This will give
> us fine control of how many threads each child had, but it is an
> added pipe that has to be watched by each process, and (this is an
> incredibly minor thing) we've limited our number of children to the
> number of file descriptors allowed per-process.
> 
> 3. Let each child manage its own thread pool, based on how busy it is,
> without any guidance from the parent . This appeals to me, but I
> shudder at what the behavior of the server might be with this change.
> 
> -- 
> Manoj Kasichainula - manojk@raleigh.ibm.com
> IBM, Apache Development
> 

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.