You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ryan Bloom <rb...@raleigh.ibm.com> on 1999/02/19 19:17:56 UTC

new acceptor logic.

A while back, I suggested we use a file descriptor queue of length N,
where N is the number of worker thread, to handle serving requests.
Basically, an acceptor thread would get the request, and place the new
socket decriptor in the fdqueue, where a worker thread would wake up, and
serve the reqeust.

We implemented this, and found a nasty bug in the logic.  The acceptor
queue is fast.  So fast, that the inital process never relinquishes the
accept_mutex loing enough for another process to grab it.  So, what
happens is that if I connect to the server, and have all of my server busy
serving long-lived requests, the parent process spawns a new server that
never does the accepting, and because my fdqueue is currently empty (all
the socket's that were in there have been removed and are being served
currently, my accept thread conitnues to accept new connections, putting
the new socket in the queue.  If none of the original N requests is
finished for five minutes, the new request isn't handled for five minutes.
And we don't move onto the next process until the fdqueue has been filled
for the second time.

This is bad.

So, here is the new solution for everybodies critique.  Have the fdqueue
be dyanmic in length.  The fdqueue is always as large as the number of
idle worker threads.  If the acceptor has no place to put new sockets, he
goes to sleep, and another process will take over.

Thoughts?

Ryan

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.

Re: new acceptor logic.

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.

> OK, we can probably agree that situations where there isn't always a new
> connection ready to be accepted (or fd already accepted) when a worker
> thread finishes a previous connection are not interesting -- the load
> isn't high enough to saturate the server, so sleeping we don't care about.

Agreed.

> Am I still missing something?

I'm not sure we are talking about the same case.  I agree, your method
will work better than mine when we are only accepting on one socket, and I
do plan to implement this at some point.  However, what about multiple
sockets.  I see two options:

1)  Using a counting mutex, we let N threads in to accept on N sockets.
Inside the mutex, we grab another mutex and look for a socket to listen on
in an array of sockets.  When we find one, we update the array of
sockets.  Release the mutex.  Accept on the socket and serve the request.

2)  All Threads block on one mutex.  When they get the mutex, they do a
select to find a socket that has data coming in.  Do the accept, serve the
request.

On every system we have looked at, select is a very expensive operation,
and causes a performance hit.  We haven't done any real acurate
benchmarking yet, but that is in the plan.

I will implement both of these as soon as I have a chance, and then we can
do some benchmarking to determine the best method.  I believe we have to
compare algorithms that will accept connections on multiple sockets.

Ryan

Re: new acceptor logic.

Posted by Dean Gaudet <dg...@arctic.org>.

On Mon, 22 Feb 1999, Ryan Bloom wrote:

> This is also a performance hit.  What you are suggesting, is a pool of
> threads where each thread does an accept, and then serves the request.
> When it leaves the accept, another thread does the accept, and when the
> inital thread finishes serving the request, it goes back to blocking on
> the mutex.  Basically, the hit comes in that for every request, we are
> waking up a thread, and then having the thread sleep.  Our performance
> people have found that having threads wake up and go to sleep is one of
> the places that most of the time is wasted.

OK, we can probably agree that situations where there isn't always a new
connection ready to be accepted (or fd already accepted) when a worker
thread finishes a previous connection are not interesting -- the load
isn't high enough to saturate the server, so sleeping we don't care about.

So consider the case where the load is high enough that when a worker
thread finishes a connection there is another connection ready to be
accepted, or in your case an fd waiting in the queue.

In your case, to get the fd into the queue you had to do a context switch
into the acceptor thread, do a few accept()s, and switch back to a worker
thread.

In my case, the worker thread falls right into accept() and gets the fd,
and that extra context switch never happens.

The only time threads sleep in my situation is when there isn't enough
work to do.  Presumably you have an upper limit on the number of threads
that you allow to run -- otherwise you waste CPU with too much cache
thrashing, or consume too much ram.  Similarly in your implementation
threads only sleep when there isn't enough work to do, but in order to get
work for your threads you throw in two extra context switches.

Am I still missing something?

> Even if the thread trys to acquire the mutex before going to sleep, there
> is a very small window during which it will be the one to acquire the
> mutex, so it is most likely to go to sleep for at least some period of
> time.  With multiple process and multiple threads all grabbing for that
> mutex, the chance that the original thread sleeps increases, because there
> is less of a chance the mutex will be available.

I don't understand what you're saying.  When you say "acquire the mutex
before going to sleep" are you referring to context switching due to using
up the time slice?

My suggestion looks like this:

    for (;;) {
	acquire mutex
	accept connection
	release mutex
	service connection
    }

Your code looks like this:

    accept thread:

	for(;;) {
	    accept connection
	    acquire mutex
	    push connection
	    awaken others
	    release mutex
	}

    worker thread:

	for(;;) {
	    acquire mutex
	    if nothing on queue
		go to sleep
	    pop connection 
	    release mutex
	    service connection
	}

You have the same window in which a worker thread can grab the mutex,
then be put to sleep due to exhausting its time slice, and then no other
threads can make progress on new connections until it's scheduled again.

Dean

Re: new acceptor logic.

Posted by Ryan Bloom <rb...@raleigh.ibm.com>.

This is also a performance hit.  What you are suggesting, is a pool of
threads where each thread does an accept, and then serves the request.
When it leaves the accept, another thread does the accept, and when the
inital thread finishes serving the request, it goes back to blocking on
the mutex.  Basically, the hit comes in that for every request, we are
waking up a thread, and then having the thread sleep.  Our performance
people have found that having threads wake up and go to sleep is one of
the places that most of the time is wasted.

Even if the thread trys to acquire the mutex before going to sleep, there
is a very small window during which it will be the one to acquire the
mutex, so it is most likely to go to sleep for at least some period of
time.  With multiple process and multiple threads all grabbing for that
mutex, the chance that the original thread sleeps increases, because there
is less of a chance the mutex will be available.

With the single acceptor thread per socket, we have X threads always
accepting data, and then we push that data onto a queue.  A thread wakes
up and grabs the data.  If when it comes back after serving the request,
there is data on the queue again, it doesn't sleep at all.  Instead, the
original thread grabs the data off the queue, and serves the request.

That is our current thinking, is that we want to optimize our performance,
and reduce the number of times we have to have threads sleeping.

Ryan

On Mon, 22 Feb 1999, Dean Gaudet wrote:

> 
> 
> On Mon, 22 Feb 1999, Manoj Kasichainula wrote:
> 
> > > Or pay one worker per socket -- let multiple be in accept, just on
> > > different sockets.  If folks have less than busy sockets they'll be paying
> > > one less than busy thread... but little extra cpu otherwise.
> > 
> > If I understand this, we have to have a mechanism to make sure that
> > worker threads accept() on the busiest sockets. We've thought about
> > it, since it's what the mainframe web server developers are working
> > with, but we haven't come up with a bulletproof way of doing this on
> > Unix yet. Thoughts?
> 
> No, you need M+N threads where M is the number of sockets, and N is the
> number of threads you want to have in flight servicing requests.  KISS ... 
> don't do anything special for the busy sockets -- just keep a FIFO list of
> sockets that don't have anything accepting on them.
> 
> It's not too hard -- for a single socket you need one lock, acquire it
> when going into accept().  For multiple sockets you need a FIFO of sockets
> ready for accept, with a lock for that FIFO... although on first
> inspection it seems you need to acquire the FIFO twice per accept, which
> sucks... there's probably a trick you can play depending on which of the
> various pthread locking primitives are available. 
> 
> Dean
> 
> 
> 

_______________________________________________________________________
Ryan Bloom		rbb@raleigh.ibm.com
4205 S Miami Blvd	
RTP, NC 27709		It's a beautiful sight to see good dancers 
			doing simple steps.  It's a painful sight to
			see beginners doing complicated patterns.

Re: new acceptor logic.

Posted by Dean Gaudet <dg...@arctic.org>.

On Mon, 22 Feb 1999, Manoj Kasichainula wrote:

> > Or pay one worker per socket -- let multiple be in accept, just on
> > different sockets.  If folks have less than busy sockets they'll be paying
> > one less than busy thread... but little extra cpu otherwise.
> 
> If I understand this, we have to have a mechanism to make sure that
> worker threads accept() on the busiest sockets. We've thought about
> it, since it's what the mainframe web server developers are working
> with, but we haven't come up with a bulletproof way of doing this on
> Unix yet. Thoughts?

No, you need M+N threads where M is the number of sockets, and N is the
number of threads you want to have in flight servicing requests.  KISS ... 
don't do anything special for the busy sockets -- just keep a FIFO list of
sockets that don't have anything accepting on them.

It's not too hard -- for a single socket you need one lock, acquire it
when going into accept().  For multiple sockets you need a FIFO of sockets
ready for accept, with a lock for that FIFO... although on first
inspection it seems you need to acquire the FIFO twice per accept, which
sucks... there's probably a trick you can play depending on which of the
various pthread locking primitives are available. 

Dean

Re: new acceptor logic.

Posted by Manoj Kasichainula <ma...@raleigh.ibm.com>.

On Mon, Feb 22, 1999 at 08:55:24AM -0800, Dean Gaudet wrote:
> I sent you this in private mail... but figure I'll repeat it here for
> others -- I don't see the point in having an acceptor thread.  What's
> wrong with doing the same thing 1.x does in processes?  Just let one
> worker into the select/accept loop.  If there are no workers available
> then no accept will be performed... you don't have to add any extra logic
> to handle that.

The performance guys at work say that select and poll are performance
killers. I'm starting to get the impression that we are going through
a lot of extra work (and a lot of extra pthreads calls) to save a
single select, and that it's not worth it.

> Or pay one worker per socket -- let multiple be in accept, just on
> different sockets.  If folks have less than busy sockets they'll be paying
> one less than busy thread... but little extra cpu otherwise.

If I understand this, we have to have a mechanism to make sure that
worker threads accept() on the busiest sockets. We've thought about
it, since it's what the mainframe web server developers are working
with, but we haven't come up with a bulletproof way of doing this on
Unix yet. Thoughts?

Manoj

Re: new acceptor logic.

Posted by Bill Stoddard <st...@raleigh.ibm.com>.

Dean Gaudet wrote:

> I sent you this in private mail... but figure I'll repeat it here for
> others -- I don't see the point in having an acceptor thread.  What's
> wrong with doing the same thing 1.x does in processes?  Just let one
> worker into the select/accept loop.  If there are no workers available
> then no accept will be performed... you don't have to add any extra logic
> to handle that.
> 
> Or pay one worker per socket -- let multiple be in accept, just on
> different sockets.  If folks have less than busy sockets they'll be paying
> one less than busy thread... but little extra cpu otherwise.
> 
> Either of these reduce context switches.

I like the idea of letting multiple workers into the accept() and,
presumably, eliminate the need for select(). I am not convinced of the
performance benefit though. 

What we are aiming at with the queue implementation is to keep worker
threads active (not let them block on a mutex or select()). After a
worker thread has finished processing a request, it first checks the
queue for more available work. If it finds work, it handles it without
blocking or sleeping on the mutex. The trick is to keep the number of
elements in the queue just a bit higher than the number of worker
threads available to do the work.

-- 
Bill Stoddard
stoddard@raleigh.ibm.com

Re: new acceptor logic.

Posted by Dean Gaudet <dg...@arctic.org>.

On Fri, 19 Feb 1999, Ryan Bloom wrote:

> A while back, I suggested we use a file descriptor queue of length N,
> where N is the number of worker thread, to handle serving requests.
> Basically, an acceptor thread would get the request, and place the new
> socket decriptor in the fdqueue, where a worker thread would wake up, and
> serve the reqeust.
> 
> We implemented this, and found a nasty bug in the logic.  The acceptor
> queue is fast.  So fast, that the inital process never relinquishes the
> accept_mutex loing enough for another process to grab it.  So, what
> happens is that if I connect to the server, and have all of my server busy
> serving long-lived requests, the parent process spawns a new server that
> never does the accepting, and because my fdqueue is currently empty (all
> the socket's that were in there have been removed and are being served
> currently, my accept thread conitnues to accept new connections, putting
> the new socket in the queue.  If none of the original N requests is
> finished for five minutes, the new request isn't handled for five minutes.
> And we don't move onto the next process until the fdqueue has been filled
> for the second time.
> 
> This is bad.
> 
> So, here is the new solution for everybodies critique.  Have the fdqueue
> be dyanmic in length.  The fdqueue is always as large as the number of
> idle worker threads.  If the acceptor has no place to put new sockets, he
> goes to sleep, and another process will take over.
> 
> Thoughts?

I sent you this in private mail... but figure I'll repeat it here for
others -- I don't see the point in having an acceptor thread.  What's
wrong with doing the same thing 1.x does in processes?  Just let one
worker into the select/accept loop.  If there are no workers available
then no accept will be performed... you don't have to add any extra logic
to handle that.

Or pay one worker per socket -- let multiple be in accept, just on
different sockets.  If folks have less than busy sockets they'll be paying
one less than busy thread... but little extra cpu otherwise.

Either of these reduce context switches.

Dean