You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Pane <br...@cnet.com> on 2002/11/25 16:12:43 UTC

Re: request for comments: multiple-connections-per-thread MPM design

On Mon, 2002-11-25 at 00:20, Manoj Kasichainula wrote:
> On Sat, Nov 23, 2002 at 06:40:58PM -0800, Brian Pane wrote:
> > Here's an outline of my latest thinking on how to build a
> > multiple-connections-per-thread MPM for Apache 2.2.  I'm
> > eager to hear feedback from others who have been researching
> > this topic.
> 
> You prodded me into finally writing up a proposal that's been bouncing
> around in my head for a while now. That was in a seperate message, this
> will be suggestions for your proposal.
> 
> > 1. Listener thread
> >       A Listener thread accept(2)s a connection, creates
> >       a conn_rec for it, and sends it to the Reader thread.
> 
> Some (Most?) protocols have the server initiate the protocol
> negotatiation instead of the client, so the listener needs to be able to
> pass off to the writer thread as well.
> 
> > * Limiting the Reader and Writer pools to one thread each will
> >   simplify the design and implementation.  But will this impair
> >   our ability to take advantage of lots of CPUs?
> 
> I was actually wondering why the reader and writer were seperate
> threads.

It was a combination of several factors that convinced me
to make them separate:
* Take advantage of multiple CPUs more easily
* Simplify the application logic
* Reduce the number of file descriptors that each poll call
  is handling (important on platforms where we don't have
  an efficient poll mechanism)

> What gets more complex with a thread pool > 1? I know we'd have to add a
> mutex around the select+(read|write), but is there something else?

If you split the pollset into 'n' sections and have 'n'
threads each handling reads or writes on one section of
it, it can be hard to balance the load.  Some threads
will end up with very active connections, while others
will have mostly idle connections.

The alternative is to have 'n' threads that take turns
handling the entire pollset.  That doesn't offer as much
concurrency, so I'm not sure if it's worth the extra
complexity.  But it would be easy to test.


> > * Can we eliminate the listener thread?  It would be faster to just
> >   have the Reader thread include the listen socket(s) in its pollset.
> >   But if we did that, we'd need some new way to synchronize the
> >   accept handling among multiple child processes, because we can't
> >   have the Reader thread blocking on an accept mutex when it has
> >   existing connections to watch.
> 
> You could dispense with the listener thread in the single-process case
> and just use an intraprocess mutex around select+(accept|read|write)

Right, with the accept/read/write all handled by the same
thread (or thread pool), the handoff problem goes away.

> > * Is there a more efficient way to interrupt a thread that's
> >   blocked in a poll call?  That's a crucial step in the Listener-to-
> >   Reader and Request Processor-to-Writer handoffs.  Writing a byte
> >   to a pipe requires two extra syscalls (a read and a write) per
> >   handoff.  Sending a signal to the target thread is the only
> >   other solution I can think of at the moment, but that's bad
> >   because the target thread might be in the middle of a read
> >   or write call, rather than a poll, at the moment when we hit
> >   it with a signal, so the read or write will fail with EINTR.
> 
> For Linux 2.6, file notifications could be done entirely in userland in
> the case where no blocking is needed, using "futexes".

Thanks!  I'll check out futexes.

> 
> But if you want to avoid the extra system calls, you could put a mutex
> around maintenence of the pollset and just let the various threads dork
> with it directly.
> 
> I do keep mentioning this mutex around the select/poll :). Is there a
> performance reason that you're trying to avoid it? In my past skimmings,
> I've seen you post a lot of benchmarks and such, so maybe you've studied
> this.

The real reason I don't like the mutex around the poll is that
it would add too much latency if we had to wait for the current
poll to complete before adding a new descriptor.  When the
Listener accepts a new connection, or a Request Processor creates
a new response brigade, it needs to get the corresponding socket
added to the pollset immediately, which really requires interrupting
the current poll.

Brian



Re: request for comments: multiple-connections-per-thread MPM design

Posted by Glenn <gs...@gluelogic.com>.
On Thu, Dec 12, 2002 at 12:39:17AM -0800, Manoj Kasichainula wrote:
...
> > Add a descriptor (pipe, socket, whatever) to the pollset and use
> > it to indicate the need to generate a new pollset.  The thread that sends
> > info down this descriptor could be programmed to wait a short amount of
> > time between sending triggers, so as not to cause the select() to return
> > too, too often, but short enough not to delay the handling of new
> > connections too long.
> 
> But what's a good value?
...
> Hmmm, if the poll is waiting on fds for any length of time, it should be
> ok to interrupt it, because by definition it's not doing anything else.
> 
> So maybe the way to go is to forget about waiting the 0.1 s to interrupt
> poll. Just notify it immediately when there's a fd waiting to be polled.
> If no other fds have work to provide, we add the new fds to the poll set
> and continue.
> Otherwise, just run through all the other fds that need handling first,
> then pick off all the fds that are waiting for polling and add them to
> the fd set.
> 
> So (again using terms from my proposal):
> 
> submit_ticket would push fds into a queue and write to new_event_pipe if
> the queue was empty when pushing.
> 
> get_next_event would do something like:
> 
> if (previous_poll_fds_remaining) {
>     pick one off, call event handler for it
> }
> else {
>     clean out new_event_queue and put values into new poll set
>     poll(pollfds, io_timeout);
>     call event handler for one of the returned pollfds
> }
...

+1 on concept with comments:
Each time poll returns to handle ready fds, it should skip new_event_pipe
(it should not send than fd to an event handler), and it should check
new_event_queue for fds to add to the pollset before it returns to polling.

It should always be doing useful work or should be blocking in select(),
because it will always have at least one fd -- it's end of new_event_pipe --
in its pollset.


Coding to interrupt the poll immediately is the first thing to do, and
then a max short delay can be added to submit_ticket only if necessary.

As you said, the max short delay would only affect the unbusy case where
the poll is waiting on all current members of the pollset.  The short
delay had been suggested to prevent interrupting select() before select()
had a chance to do any useful work.  We won't know if this is a real or
imagined problem until it is tested.  It sounds like it won't be a
performance problem, although using the max short timer of even 0.05s might
slightly reduce the CPU usage of these threads when under heavy load.

-Glenn

Re: request for comments: multiple-connections-per-thread MPM design

Posted by Manoj Kasichainula <ma...@io.com>.
Took too long to respond. Oh well, no one else did either...

On Tue, Nov 26, 2002 at 01:14:10AM -0500, Glenn wrote:
> On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote:
> > BTW, ISTR Ryan commenting a while back that cross-thread signalling
> > isn't reliable, and it scares me in general, so I'd lean towards the
> > pipe.
> > 
> > I'm pondering what else could be done about this; having to muck with a
> > pipe doesn't feel like the right thing to do.
> 
> Why not?

Good question. I'm still waffling on this.

> Add a descriptor (pipe, socket, whatever) to the pollset and use
> it to indicate the need to generate a new pollset.  The thread that sends
> info down this descriptor could be programmed to wait a short amount of
> time between sending triggers, so as not to cause the select() to return
> too, too often, but short enough not to delay the handling of new
> connections too long.

But what's a good value? Any value picked is going to be too annoying.
0.1 s means delaying lots of threads up to a tenth of a second. And
there would be good reasons for wanting to lower that value, and to not
lower that value. Which would mean it would need to be a tunable
parameter depending on network and CPU characteristics, and needing a
tunable parameter for this just seems ooky. 

But just picking a good value and sticking with it might not be too bad.
The correct thing to do would be to code it up and test, but I'd rather
have a reasonable idea of the chances for success first. :)

In the perfect case, each poll call would return immediately with lots
of file descriptors ready for work, and they would all get farmed out.
Then before the next poll runs, there are more file descriptors ready to
be polled. 

Hmmm, if the poll is waiting on fds for any length of time, it should be
ok to interrupt it, because by definition it's not doing anything else.

So maybe the way to go is to forget about waiting the 0.1 s to interrupt
poll. Just notify it immediately when there's a fd waiting to be polled.
If no other fds have work to provide, we add the new fds to the poll set
and continue.

Otherwise, just run through all the other fds that need handling first,
then pick off all the fds that are waiting for polling and add them to
the fd set.

So (again using terms from my proposal):

submit_ticket would push fds into a queue and write to new_event_pipe if
the queue was empty when pushing.

get_next_event would do something like:

if (previous_poll_fds_remaining) {
    pick one off, call event handler for it
}
else {
    clean out new_event_queue and put values into new poll set
    poll(pollfds, io_timeout);
    call event handler for one of the returned pollfds
}

Something was bothering me about this earlier, and I can't remember what
it is. Maybe it's that when the server isn't busy, a single ticket
submission will make 2 threads (the ticket submitter and the thread
holding the poll mutex) do stuff. Maybe even 3 threads since a new
thread could take the poll mutex. But since this is the unbusy case,
it's not quite so bad.


Re: request for comments: multiple-connections-per-thread MPM design

Posted by Glenn <gs...@gluelogic.com>.
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Manoj Kasichainula wrote:
> On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote:
> > The real reason I don't like the mutex around the poll is that
> > it would add too much latency if we had to wait for the current
> > poll to complete before adding a new descriptor.  When the
> > Listener accepts a new connection, or a Request Processor creates
> > a new response brigade, it needs to get the corresponding socket
> > added to the pollset immediately, which really requires interrupting
> > the current poll.
> 
> Hmmm. That's a problem that needs solving even without the mutex though
> (and it affects the design I proposed yesterday as well).  When you're
> adding a new fd to the reader or writer, you have to write to a pipe or
> send a signal. The mutex shouldn't affect that. 
> 
> BTW, ISTR Ryan commenting a while back that cross-thread signalling
> isn't reliable, and it scares me in general, so I'd lean towards the
> pipe.
> 
> I'm pondering what else could be done about this; having to muck with a
> pipe doesn't feel like the right thing to do.

Why not?  Add a descriptor (pipe, socket, whatever) to the pollset and use
it to indicate the need to generate a new pollset.  The thread that sends
info down this descriptor could be programmed to wait a short amount of
time between sending triggers, so as not to cause the select() to return
too, too often, but short enough not to delay the handling of new
connections too long.  And the select()er thread would need to add a quick
step to check for this special descriptor instead of treating them all as
external requests.  It would also need to somehow signal the other thread
each time select() returned so that waiting descriptors could be added
immediately.

Or am I smoking what Manoj is smoking?

-Glenn

Re: request for comments: multiple-connections-per-thread MPM design

Posted by Manoj Kasichainula <ma...@io.com>.
On Mon, Nov 25, 2002 at 08:36:59PM -0800, Me at IO wrote:
> I'm just guessing here, but I imagine most CPU effort wouldn't be
> expended in the actual kernel<->user transitions that are polls and
> non-blocking I/O.  And the meat of those operations could be handled by
> other CPUs at the kernel level. So that separation onto multiple
> CPUs might not help much.

Eh, I was on crack when I wrote this. You want an I/O thread per CPU
when you can get it.

Re: request for comments: multiple-connections-per-thread MPM design

Posted by Manoj Kasichainula <ma...@io.com>.
On Mon, Nov 25, 2002 at 07:12:43AM -0800, Brian Pane wrote:
> On Mon, 2002-11-25 at 00:20, Manoj Kasichainula wrote:
> > I was actually wondering why the reader and writer were seperate
> > threads.
> 
> It was a combination of several factors that convinced me
> to make them separate:
> * Take advantage of multiple CPUs more easily

Yeah, but as you noticed, once you get more than 2 CPUs, you have the
same problem.

I'm just guessing here, but I imagine most CPU effort wouldn't be
expended in the actual kernel<->user transitions that are polls and
non-blocking I/O.  And the meat of those operations could be handled by
other CPUs at the kernel level. So that separation onto multiple
CPUs might not help much.

> * Reduce the number of file descriptors that each poll call
>   is handling (important on platforms where we don't have
>   an efficient poll mechanism)

Has anyone read or benchmarked whether 2 threads polling 500 fds is
faster than 1 thread polling 1000?

> > For Linux 2.6, file notifications could be done entirely in userland in
> > the case where no blocking is needed, using "futexes".
> 
> Thanks!  I'll check out futexes.

Note that futexes are just Fast User mUTEXES. Those are already in the
kernel (according to some threads I read yesterday anyway). But I
beleive the part about file notification using them is still in
discussion.

> > But if you want to avoid the extra system calls, you could put a mutex
> > around maintenence of the pollset and just let the various threads dork
> > with it directly.
> > 
> > I do keep mentioning this mutex around the select/poll :). Is there a
> > performance reason that you're trying to avoid it? In my past skimmings,
> > I've seen you post a lot of benchmarks and such, so maybe you've studied
> > this.
> 
> The real reason I don't like the mutex around the poll is that
> it would add too much latency if we had to wait for the current
> poll to complete before adding a new descriptor.  When the
> Listener accepts a new connection, or a Request Processor creates
> a new response brigade, it needs to get the corresponding socket
> added to the pollset immediately, which really requires interrupting
> the current poll.

Hmmm. That's a problem that needs solving even without the mutex though
(and it affects the design I proposed yesterday as well).  When you're
adding a new fd to the reader or writer, you have to write to a pipe or
send a signal. The mutex shouldn't affect that. 

BTW, ISTR Ryan commenting a while back that cross-thread signalling
isn't reliable, and it scares me in general, so I'd lean towards the
pipe.

I'm pondering what else could be done about this; having to muck with a
pipe doesn't feel like the right thing to do. Perhaps I should actually
look at other people's code to see what they do. Other designs have
threads for disk I/O and such, so there should be a way. I believe
Windows doesn't have this problem, or at least hides it better, because
completion ports are independent entities that don't interact with each
other as far as the user is concerned.