You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by dean gaudet <de...@arctic.org> on 2001/09/10 22:30:03 UTC

Re: [PATCH] worker MPM: reuse transaction pools

this is the wrong way to fix this problem.

i can't imagine any reason why creating a pool should be slow --
rather than band-aid around it, i think it'd be better to find out that
problem first.  it should be as simple as a couple pointer operations.

freelists are a feature of modern memory allocators -- including per-cpu
optimisations.  the right fix is probably to start relying on libc more.
(see apr-dev threads a few months ago where i was advocating getting rid
of apr freelists entirely.)

fwiw -- freelist implementations are almost always better off LIFO rather
than FIFO.  basically the most recently freed object is more likely to be
mapped in the TLB and have valid cache lines.  older objects will incur
TLB and cache misses.

you can take these comments as a veto, but i know the patch has been
committed already.

-dean

On Mon, 27 Aug 2001, Aaron Bannert wrote:

> This patch implements a resource pool of context pools -- a queue of
> available pools that the listener thread can pull from when accepting
> a request. The worker thread that picks up that request then uses
> that pool for the lifetime of that transaction, clear()ing the pool
> and releasing it back to what I'm calling the "pool_queue" (har har).
> This replaces the prior implementation that would create and destroy
> a transaction pool for each and every request.
>
> I'm seeing a small performance improvement with this patch, but I suspect
> the fd_queue code could be improved for better parallelism. I also
> suspect that with better testing this algorithm may prove more scalable.
>
> -aaron

Re: [PATCH] worker MPM: reuse transaction pools

Posted by Aaron Bannert <aa...@clove.org>.

On Mon, Sep 10, 2001 at 03:13:48PM -0700, Brian Pane wrote:
> I suppose there are a total of four candidate designs, right?  It looks
> like you're varying two independent dimensions:
>   * condition variable strategy: one CV, or one per thread
>   * pool strategy: worker-managed, or listener-managed.
> so there are four possible permutations.  (The other two
> permutations would look like hybrids of "short and sweet" and
> "time-space tradeoff."  I don't know whether they're interesting
> algorithms or not, but it's worth noting that these hybrids are
> a possibility if you find in testing that neither of the first
> two designs stands out as the clear winner.)
> In the "pool strategy" dimension, I think a worker-managed pool design (as
> used in "time-space tradeoff") is a better choice than a 
> listener-managed one.
> With worker-managed, you can clear and re-use the pool, which is cheaper 
> than
> destroying and recreating it.  And it's very easy to eliminate locking for
> subpool creation by adding a thread-private free list (you could dig up my
> old patch for this, or Sander's patch which is a more general-purpose 
> solution).
> 
> In the "CV strategy" dimension, I'll wait for benchmark results before I try
> to guess which approach is fastest. :-)

Actually, the two variants you describe here are not entirely distinct. In
both of my above models there is a global CV for managing the worker
queue. You are correct about the pool strategy, but that is pretty much
the only variant.

In order to reuse pools, we have to have a way for the listener to
get ahold of the transaction pool before it calls the accept. The
product of accept() is the socket, which must then make it back to the
appropriate worker thread, and that worker must somehow be awakened to
begin processing of the accepted socket.

The "short and sweet" model is much more a producer-consumer: the listener
thread produces (pool, socket) pairs that are then consumed by the worker
threads.

The "time-space tradeoff" model implements a resource-pool of
waiting worker threads; the listener plucks out the next available
worker thread, calls accept() on it's behalf, and sends it on it's merry
way.

I'm beginning to realize that I will end up comparing the overhead of
creating/destroying a transaction pool once per request ("short and sweet"),
and the overhead of managing that transaction pool in the reusable case
with the added overhead of an extra CV/mutex per thread
("time-space tradeoff").

(Hope that makes more sense to you, it definately helped me :)
-aaron

Re: [PATCH] worker MPM: reuse transaction pools

Posted by Brian Pane <bp...@pacbell.net>.

Aaron Bannert wrote:

[...]

>1) "short and sweet"
>
> - single listener
>   - creates a new transaction pool
>   - uses that pool for the next accept()
>   - push()es the newly accepted socket and pool on the fd_queue
>
> - multiple workers
>   - waiting in pop() on the fd_queue
>   - performs process_socket() with the socket and pool received from pop()
>   - destroys the pool
>
> Notes: this is almost identical to what was in CVS before the patch
>        discussed below. The only change I would make would be to
>        remove one of the two condition variables.
>
>2) "time-space tradeoff"
>
> - single listener
>   - pop()s the next available worker-thread
>   - uses the pool from this worker-thread to make the call to accept()
>   - signals a condition in that worker-thread after the accept()
>
> - multiple workers
>   - creates a transaction pool
>     - push()es thread info (pool, socket-pointer, etc)
>     - waits on signal from listener
>     - (check that listener woke us up)
>     - perform process_socket() with the socket set from the listener
>     - clear the pool
>
I suppose there are a total of four candidate designs, right?  It looks
like you're varying two independent dimensions:
  * condition variable strategy: one CV, or one per thread
  * pool strategy: worker-managed, or listener-managed.
so there are four possible permutations.  (The other two
permutations would look like hybrids of "short and sweet" and
"time-space tradeoff."  I don't know whether they're interesting
algorithms or not, but it's worth noting that these hybrids are
a possibility if you find in testing that neither of the first
two designs stands out as the clear winner.)
In the "pool strategy" dimension, I think a worker-managed pool design (as
used in "time-space tradeoff") is a better choice than a 
listener-managed one.
With worker-managed, you can clear and re-use the pool, which is cheaper 
than
destroying and recreating it.  And it's very easy to eliminate locking for
subpool creation by adding a thread-private free list (you could dig up my
old patch for this, or Sander's patch which is a more general-purpose 
solution).

In the "CV strategy" dimension, I'll wait for benchmark results before I try
to guess which approach is fastest. :-)

--Brian

Re: [PATCH] worker MPM: reuse transaction pools

Posted by Aaron Bannert <aa...@clove.org>.

The patch in question was nothing more than an attempt to further improve
worker, and was perhaps a little premature. I have, however, spent some
time developing two possible alternative implementations of the worker
MPM in addition to what is currently in CVS (which means that I don't
expect what's in there to stay for much longer). I am hoping to gain
access to some SMP boxes to test out my theories, but until then I will
describe the two scenarios here (and perhaps also prepare patches for
everyone to run on their favorite SMP box at their leisure):

1) "short and sweet"

 - single listener
   - creates a new transaction pool
   - uses that pool for the next accept()
   - push()es the newly accepted socket and pool on the fd_queue

 - multiple workers
   - waiting in pop() on the fd_queue
   - performs process_socket() with the socket and pool received from pop()
   - destroys the pool

 Notes: this is almost identical to what was in CVS before the patch
        discussed below. The only change I would make would be to
        remove one of the two condition variables.

2) "time-space tradeoff"

 - single listener
   - pop()s the next available worker-thread
   - uses the pool from this worker-thread to make the call to accept()
   - signals a condition in that worker-thread after the accept()

 - multiple workers
   - creates a transaction pool
     - push()es thread info (pool, socket-pointer, etc)
     - waits on signal from listener
     - (check that listener woke us up)
     - perform process_socket() with the socket set from the listener
     - clear the pool

 Notes: This adds some complication to the fd_queue code, but it removes
        most of the complication that was a problem in the below patch.
        My implementation was able to remove much of the arithmetic from
        the critical sections of the fd_queue, possibly decreasing contention
        in this part of the code (especially good for scalability). From
        what we have in CVS, it brings us from 4 shared condition variables
        down to 1, and adds a new CV to each thread (sort of like a
        time-space tradeoff). Your note about LIFO vs. FIFO is noted, and
        I'll implement a minor modification to test out this theory.


I had hoped that I would be able to test out these two implementations on
some big hardware before posting such lengthy design descriptions to the
list, but if it would interest others I would be willing to prepare a patch
illustrating the two above designs. I welcome any critical disection of
my above designs.

-aaron


On Mon, Sep 10, 2001 at 01:30:03PM -0700, dean gaudet wrote:
> this is the wrong way to fix this problem.
> 
> i can't imagine any reason why creating a pool should be slow --
> rather than band-aid around it, i think it'd be better to find out that
> problem first.  it should be as simple as a couple pointer operations.
> 
> freelists are a feature of modern memory allocators -- including per-cpu
> optimisations.  the right fix is probably to start relying on libc more.
> (see apr-dev threads a few months ago where i was advocating getting rid
> of apr freelists entirely.)
> 
> fwiw -- freelist implementations are almost always better off LIFO rather
> than FIFO.  basically the most recently freed object is more likely to be
> mapped in the TLB and have valid cache lines.  older objects will incur
> TLB and cache misses.
> 
> you can take these comments as a veto, but i know the patch has been
> committed already.
> 
> -dean
> 
> On Mon, 27 Aug 2001, Aaron Bannert wrote:
> 
> > This patch implements a resource pool of context pools -- a queue of
> > available pools that the listener thread can pull from when accepting
> > a request. The worker thread that picks up that request then uses
> > that pool for the lifetime of that transaction, clear()ing the pool
> > and releasing it back to what I'm calling the "pool_queue" (har har).
> > This replaces the prior implementation that would create and destroy
> > a transaction pool for each and every request.
> >
> > I'm seeing a small performance improvement with this patch, but I suspect
> > the fd_queue code could be improved for better parallelism. I also
> > suspect that with better testing this algorithm may prove more scalable.
> >
> > -aaron