You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/12/05 21:27:27 UTC

Async I/O question?

How would an async I/O MPM handle a flush bucket?

What I'm missing is that you may not always migrate the thread
when doing an I/O because the non-I/O thread may still have
stuff to write.

In Dean's descriptions of his ASH MPM (again, I may be missing 
something), he talked about how the thread desiring I/O would 
hand-off the request to the I/O thread and return to the idle 
state - the assumption is that the request is complete.  But,
making an I/O call doesn't mean that it is completed.

Yet, that's the only clean way I can see how to handle this 
migration - because I don't see a way to resume the exact thread 
context in a clean way.  (What I think we would need if we don't 
have a single thread per request is to supply a post I/O
function that the I/O layer can indicate that a non-I/O thread
should call when I/O is complete - pure event-based programming.)

I guess I'm thinking of something like a proxy server application
where we don't necessarily have all of the data up front - we may
only have snippets of the data (because the origin server hasn't
finished writing everything, but we have a partial response) - so 
write them out and flush as we get data.  This seems valid under 
the 2.0 filter semantics.  And, I'll argue that we must allow 
modules to call for a flush.  -- justin

Re: Async I/O question?

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 05 December 2001 01:28 pm, Justin Erenkrantz wrote:
> On Wed, Dec 05, 2001 at 01:10:42PM -0800, Ryan Bloom wrote:
> > 2)  In a partial Async model, like what Dean is suggesting, the
> > I/O thread needs to be able to accept multiple chunks of data
> > to be written to the client.  This would allow you to handle a flush
> > bucket, and the processing thread wouldn't stop processing the
> > request, it juts wouldn't wait to continue processing if the data
> > couldn't be written immediately.  The point is to get the processing
> > threads done with the request ASAP, so that they can handle the
> > next request.  The I/O for the threads can wait an extra 1/2 second
> > or two.
> >
> > Think of it as three threads all working together.
>
> <light bulb goes on>
>
> > Now, this can be improved even more by moving to a four
> > thread model, where one thread is dedicated to reading from
> > the network, and one is dedicated to writing to the network.
>
> Thanks for the clarification.  It helps tremendously.  So, we
> aren't talking about a pure async-model - just where we attempt
> to hand-off.  And, moving to a four thread model may be hindered
> by the specific OS - like /dev/poll can indicate reading and
> writing on the same socket.  Would we want two threads sharing
> ownership of the socket?  Perhaps.  However, something like SSL
> would complicate things (think renegotiations).

It can indicate reading/writing on the same socket, but we may not
want it to.  As for SSL, because we are doing encryption in memory
instead of directly to the socket, this shouldn't be a big problem.

> 1) Any transient buckets will have to be setaside in this MPM.
>    Is this a concern?  It seems that you also can't reuse the
>    same memory space within the output loop.  Once I pass it
>    down the chain, I must say good-bye to any memory or data
>    pointed within the bucket.  (We couldn't even reuse heap
>    data.)  Is this even a change from current semantics?

We'll have to set aside transient data.  We already say that filters
have to forget about any data once it has been passed down the
stack, so that isn't a change in semantics.

> 2) We could implement this solely by manipulating the socket
>    hooks you added, right?  Would there be any change external to
>    the MPM?  (I guess we wouldn't know until we tried perhaps...)

There shouldn't be.  A lot of the work I did a few weeks ago was to
help make this possible with the 2.0 architecture.  I have a few more
things that can be done with those changes, but those are more for
me to play with than useful projects.

> 3) In the read case, the I/O is directed to a specific worker
>    thread, right?  So, a worker thread makes a request for some
>    amount of I/O and it is delivered to that same thread (so we
>    can still use thread-local storage)?  The wait for data from
>    I/O thread in worker thread will be synchronous.

Presumably yes, but if this is designed correctly, we could move
to an async model for input too, where the thread that requested
the data may not be the thread that receives it.

> 4) What happens when all of the IO threads are full (and their
>    ensuing buffers are full too)?  Do we just force the worker to
>    wait?  In fact, I'd imagine this would be a common case.  The
>    worker threads should be fairly fast - the IO thread would be
>    the slow ones.

I don't think that has been fully designed yet.  I mean, minimally
it will have to wait, but the answer may also be to create a
second I/O thread to pick up some of the left overs.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: Async I/O question?

Posted by Justin Erenkrantz <je...@ebuilt.com>.

On Wed, Dec 05, 2001 at 01:10:42PM -0800, Ryan Bloom wrote:
> 2)  In a partial Async model, like what Dean is suggesting, the
> I/O thread needs to be able to accept multiple chunks of data
> to be written to the client.  This would allow you to handle a flush
> bucket, and the processing thread wouldn't stop processing the
> request, it juts wouldn't wait to continue processing if the data
> couldn't be written immediately.  The point is to get the processing
> threads done with the request ASAP, so that they can handle the
> next request.  The I/O for the threads can wait an extra 1/2 second
> or two.
> 
> Think of it as three threads all working together.

<light bulb goes on>

> Now, this can be improved even more by moving to a four
> thread model, where one thread is dedicated to reading from
> the network, and one is dedicated to writing to the network.

Thanks for the clarification.  It helps tremendously.  So, we
aren't talking about a pure async-model - just where we attempt
to hand-off.  And, moving to a four thread model may be hindered
by the specific OS - like /dev/poll can indicate reading and
writing on the same socket.  Would we want two threads sharing
ownership of the socket?  Perhaps.  However, something like SSL
would complicate things (think renegotiations).

Now, some questions:

1) Any transient buckets will have to be setaside in this MPM.
   Is this a concern?  It seems that you also can't reuse the 
   same memory space within the output loop.  Once I pass it
   down the chain, I must say good-bye to any memory or data
   pointed within the bucket.  (We couldn't even reuse heap
   data.)  Is this even a change from current semantics?

2) We could implement this solely by manipulating the socket
   hooks you added, right?  Would there be any change external to
   the MPM?  (I guess we wouldn't know until we tried perhaps...)

3) In the read case, the I/O is directed to a specific worker 
   thread, right?  So, a worker thread makes a request for some 
   amount of I/O and it is delivered to that same thread (so we 
   can still use thread-local storage)?  The wait for data from 
   I/O thread in worker thread will be synchronous.

4) What happens when all of the IO threads are full (and their
   ensuing buffers are full too)?  Do we just force the worker to
   wait?  In fact, I'd imagine this would be a common case.  The
   worker threads should be fairly fast - the IO thread would be
   the slow ones.

Thanks again.  I now see the big picture...  I need to study for 
my exams now...  -- justin

Re: Async I/O question?

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 05 December 2001 01:22 pm, Ian Holsman wrote:
> isn't this VERY similiar to what SGI's state machine thingy was
> going to do ?

It most likely is, this is not a uncommon design.

Ryan
______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------

Re: Async I/O question?

Posted by Ian Holsman <ia...@cnet.com>.

isn't this VERY similiar to what SGI's state machine thingy was
going to do ?
On Wed, 2001-12-05 at 13:10, Ryan Bloom wrote:
> On Wednesday 05 December 2001 12:27 pm, Justin Erenkrantz wrote:
> 
> two things.
> 
> 1)  In a full Async model, you would need to be able to recover
> the threads context.  That isn't possible without major
> re-working of Apache, and that would be a good reason to move
> on to Apache 3.0.
> 
> 2)  In a partial Async model, like what Dean is suggesting, the
> I/O thread needs to be able to accept multiple chunks of data
> to be written to the client.  This would allow you to handle a flush
> bucket, and the processing thread wouldn't stop processing the
> request, it juts wouldn't wait to continue processing if the data
> couldn't be written immediately.  The point is to get the processing
> threads done with the request ASAP, so that they can handle the
> next request.  The I/O for the threads can wait an extra 1/2 second
> or two.
> 
> Think of it as three threads all working together.
> 
> accept thread:
> 	Loop
> 		select
> 		accept
> 		hand-off request to worker
> 	end of loop
> 
> worker thread:
> 	Loop
> 		receive request
> 		wait for data from I/O thread
> 		process request
> 		Loop
> 			Generate data			
> 			hand-off data to I/O thread
> 		End of Loop
> 	End of Loop
> 
> I/O thread:
> 	Loop
> 		if somebody's waiting for data
> 			Read from network
> 			hand-off data to worker
> 		end if
> 		if data to write
> 			receive data from worker
> 			write to network
> 		end if
> 	End of Loop
> 
> Now, this can be improved even more by moving to a four
> thread model, where one thread is dedicated to reading from
> the network, and one is dedicated to writing to the network.
> 
> The big trade-off with this async model is the sheer scalability.  If the
> threads aren't waiting to actually read and write data, they are just
> processing requests, and for a small request, this can be done insanely
> quickly.
> 
> Ryan
> 
> > How would an async I/O MPM handle a flush bucket?
> >
> > What I'm missing is that you may not always migrate the thread
> > when doing an I/O because the non-I/O thread may still have
> > stuff to write.
> >
> > In Dean's descriptions of his ASH MPM (again, I may be missing
> > something), he talked about how the thread desiring I/O would
> > hand-off the request to the I/O thread and return to the idle
> > state - the assumption is that the request is complete.  But,
> > making an I/O call doesn't mean that it is completed.
> >
> > Yet, that's the only clean way I can see how to handle this
> > migration - because I don't see a way to resume the exact thread
> > context in a clean way.  (What I think we would need if we don't
> > have a single thread per request is to supply a post I/O
> > function that the I/O layer can indicate that a non-I/O thread
> > should call when I/O is complete - pure event-based programming.)
> >
> > I guess I'm thinking of something like a proxy server application
> > where we don't necessarily have all of the data up front - we may
> > only have snippets of the data (because the origin server hasn't
> > finished writing everything, but we have a partial response) - so
> > write them out and flush as we get data.  This seems valid under
> > the 2.0 filter semantics.  And, I'll argue that we must allow
> > modules to call for a flush.  -- justin
> 
> -- 
> 
> ______________________________________________________________
> Ryan Bloom				rbb@apache.org
> Covalent Technologies			rbb@covalent.net
> --------------------------------------------------------------
-- 
Ian Holsman          IanH@cnet.com
Performance Measurement & Analysis
CNET Networks   -   (415) 344-2608

Re: Async I/O question?

Posted by Ryan Bloom <rb...@covalent.net>.

On Wednesday 05 December 2001 12:27 pm, Justin Erenkrantz wrote:

two things.

1)  In a full Async model, you would need to be able to recover
the threads context.  That isn't possible without major
re-working of Apache, and that would be a good reason to move
on to Apache 3.0.

2)  In a partial Async model, like what Dean is suggesting, the
I/O thread needs to be able to accept multiple chunks of data
to be written to the client.  This would allow you to handle a flush
bucket, and the processing thread wouldn't stop processing the
request, it juts wouldn't wait to continue processing if the data
couldn't be written immediately.  The point is to get the processing
threads done with the request ASAP, so that they can handle the
next request.  The I/O for the threads can wait an extra 1/2 second
or two.

Think of it as three threads all working together.

accept thread:
	Loop
		select
		accept
		hand-off request to worker
	end of loop

worker thread:
	Loop
		receive request
		wait for data from I/O thread
		process request
		Loop
			Generate data			
			hand-off data to I/O thread
		End of Loop
	End of Loop

I/O thread:
	Loop
		if somebody's waiting for data
			Read from network
			hand-off data to worker
		end if
		if data to write
			receive data from worker
			write to network
		end if
	End of Loop

Now, this can be improved even more by moving to a four
thread model, where one thread is dedicated to reading from
the network, and one is dedicated to writing to the network.

The big trade-off with this async model is the sheer scalability.  If the
threads aren't waiting to actually read and write data, they are just
processing requests, and for a small request, this can be done insanely
quickly.

Ryan

> How would an async I/O MPM handle a flush bucket?
>
> What I'm missing is that you may not always migrate the thread
> when doing an I/O because the non-I/O thread may still have
> stuff to write.
>
> In Dean's descriptions of his ASH MPM (again, I may be missing
> something), he talked about how the thread desiring I/O would
> hand-off the request to the I/O thread and return to the idle
> state - the assumption is that the request is complete.  But,
> making an I/O call doesn't mean that it is completed.
>
> Yet, that's the only clean way I can see how to handle this
> migration - because I don't see a way to resume the exact thread
> context in a clean way.  (What I think we would need if we don't
> have a single thread per request is to supply a post I/O
> function that the I/O layer can indicate that a non-I/O thread
> should call when I/O is complete - pure event-based programming.)
>
> I guess I'm thinking of something like a proxy server application
> where we don't necessarily have all of the data up front - we may
> only have snippets of the data (because the origin server hasn't
> finished writing everything, but we have a partial response) - so
> write them out and flush as we get data.  This seems valid under
> the 2.0 filter semantics.  And, I'll argue that we must allow
> modules to call for a flush.  -- justin

-- 

______________________________________________________________
Ryan Bloom				rbb@apache.org
Covalent Technologies			rbb@covalent.net
--------------------------------------------------------------