You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Justin Erenkrantz <je...@ebuilt.com> on 2001/12/05 21:27:27 UTC
Async I/O question?
How would an async I/O MPM handle a flush bucket?
What I'm missing is that you may not always migrate the thread
when doing an I/O because the non-I/O thread may still have
stuff to write.
In Dean's descriptions of his ASH MPM (again, I may be missing
something), he talked about how the thread desiring I/O would
hand-off the request to the I/O thread and return to the idle
state - the assumption is that the request is complete. But,
making an I/O call doesn't mean that it is completed.
Yet, that's the only clean way I can see how to handle this
migration - because I don't see a way to resume the exact thread
context in a clean way. (What I think we would need if we don't
have a single thread per request is to supply a post I/O
function that the I/O layer can indicate that a non-I/O thread
should call when I/O is complete - pure event-based programming.)
I guess I'm thinking of something like a proxy server application
where we don't necessarily have all of the data up front - we may
only have snippets of the data (because the origin server hasn't
finished writing everything, but we have a partial response) - so
write them out and flush as we get data. This seems valid under
the 2.0 filter semantics. And, I'll argue that we must allow
modules to call for a flush. -- justin
Re: Async I/O question?
Posted by Ryan Bloom <rb...@covalent.net>.
On Wednesday 05 December 2001 01:28 pm, Justin Erenkrantz wrote:
> On Wed, Dec 05, 2001 at 01:10:42PM -0800, Ryan Bloom wrote:
> > 2) In a partial Async model, like what Dean is suggesting, the
> > I/O thread needs to be able to accept multiple chunks of data
> > to be written to the client. This would allow you to handle a flush
> > bucket, and the processing thread wouldn't stop processing the
> > request, it juts wouldn't wait to continue processing if the data
> > couldn't be written immediately. The point is to get the processing
> > threads done with the request ASAP, so that they can handle the
> > next request. The I/O for the threads can wait an extra 1/2 second
> > or two.
> >
> > Think of it as three threads all working together.
>
> <light bulb goes on>
>
> > Now, this can be improved even more by moving to a four
> > thread model, where one thread is dedicated to reading from
> > the network, and one is dedicated to writing to the network.
>
> Thanks for the clarification. It helps tremendously. So, we
> aren't talking about a pure async-model - just where we attempt
> to hand-off. And, moving to a four thread model may be hindered
> by the specific OS - like /dev/poll can indicate reading and
> writing on the same socket. Would we want two threads sharing
> ownership of the socket? Perhaps. However, something like SSL
> would complicate things (think renegotiations).
It can indicate reading/writing on the same socket, but we may not
want it to. As for SSL, because we are doing encryption in memory
instead of directly to the socket, this shouldn't be a big problem.
> 1) Any transient buckets will have to be setaside in this MPM.
> Is this a concern? It seems that you also can't reuse the
> same memory space within the output loop. Once I pass it
> down the chain, I must say good-bye to any memory or data
> pointed within the bucket. (We couldn't even reuse heap
> data.) Is this even a change from current semantics?
We'll have to set aside transient data. We already say that filters
have to forget about any data once it has been passed down the
stack, so that isn't a change in semantics.
> 2) We could implement this solely by manipulating the socket
> hooks you added, right? Would there be any change external to
> the MPM? (I guess we wouldn't know until we tried perhaps...)
There shouldn't be. A lot of the work I did a few weeks ago was to
help make this possible with the 2.0 architecture. I have a few more
things that can be done with those changes, but those are more for
me to play with than useful projects.
> 3) In the read case, the I/O is directed to a specific worker
> thread, right? So, a worker thread makes a request for some
> amount of I/O and it is delivered to that same thread (so we
> can still use thread-local storage)? The wait for data from
> I/O thread in worker thread will be synchronous.
Presumably yes, but if this is designed correctly, we could move
to an async model for input too, where the thread that requested
the data may not be the thread that receives it.
> 4) What happens when all of the IO threads are full (and their
> ensuing buffers are full too)? Do we just force the worker to
> wait? In fact, I'd imagine this would be a common case. The
> worker threads should be fairly fast - the IO thread would be
> the slow ones.
I don't think that has been fully designed yet. I mean, minimally
it will have to wait, but the answer may also be to create a
second I/O thread to pick up some of the left overs.
Ryan
______________________________________________________________
Ryan Bloom rbb@apache.org
Covalent Technologies rbb@covalent.net
--------------------------------------------------------------
Re: Async I/O question?
Posted by Justin Erenkrantz <je...@ebuilt.com>.
On Wed, Dec 05, 2001 at 01:10:42PM -0800, Ryan Bloom wrote:
> 2) In a partial Async model, like what Dean is suggesting, the
> I/O thread needs to be able to accept multiple chunks of data
> to be written to the client. This would allow you to handle a flush
> bucket, and the processing thread wouldn't stop processing the
> request, it juts wouldn't wait to continue processing if the data
> couldn't be written immediately. The point is to get the processing
> threads done with the request ASAP, so that they can handle the
> next request. The I/O for the threads can wait an extra 1/2 second
> or two.
>
> Think of it as three threads all working together.
<light bulb goes on>
> Now, this can be improved even more by moving to a four
> thread model, where one thread is dedicated to reading from
> the network, and one is dedicated to writing to the network.
Thanks for the clarification. It helps tremendously. So, we
aren't talking about a pure async-model - just where we attempt
to hand-off. And, moving to a four thread model may be hindered
by the specific OS - like /dev/poll can indicate reading and
writing on the same socket. Would we want two threads sharing
ownership of the socket? Perhaps. However, something like SSL
would complicate things (think renegotiations).
Now, some questions:
1) Any transient buckets will have to be setaside in this MPM.
Is this a concern? It seems that you also can't reuse the
same memory space within the output loop. Once I pass it
down the chain, I must say good-bye to any memory or data
pointed within the bucket. (We couldn't even reuse heap
data.) Is this even a change from current semantics?
2) We could implement this solely by manipulating the socket
hooks you added, right? Would there be any change external to
the MPM? (I guess we wouldn't know until we tried perhaps...)
3) In the read case, the I/O is directed to a specific worker
thread, right? So, a worker thread makes a request for some
amount of I/O and it is delivered to that same thread (so we
can still use thread-local storage)? The wait for data from
I/O thread in worker thread will be synchronous.
4) What happens when all of the IO threads are full (and their
ensuing buffers are full too)? Do we just force the worker to
wait? In fact, I'd imagine this would be a common case. The
worker threads should be fairly fast - the IO thread would be
the slow ones.
Thanks again. I now see the big picture... I need to study for
my exams now... -- justin
Re: Async I/O question?
Posted by Ryan Bloom <rb...@covalent.net>.
On Wednesday 05 December 2001 01:22 pm, Ian Holsman wrote:
> isn't this VERY similiar to what SGI's state machine thingy was
> going to do ?
It most likely is, this is not a uncommon design.
Ryan
______________________________________________________________
Ryan Bloom rbb@apache.org
Covalent Technologies rbb@covalent.net
--------------------------------------------------------------
Re: Async I/O question?
Posted by Ian Holsman <ia...@cnet.com>.
isn't this VERY similiar to what SGI's state machine thingy was
going to do ?
On Wed, 2001-12-05 at 13:10, Ryan Bloom wrote:
> On Wednesday 05 December 2001 12:27 pm, Justin Erenkrantz wrote:
>
> two things.
>
> 1) In a full Async model, you would need to be able to recover
> the threads context. That isn't possible without major
> re-working of Apache, and that would be a good reason to move
> on to Apache 3.0.
>
> 2) In a partial Async model, like what Dean is suggesting, the
> I/O thread needs to be able to accept multiple chunks of data
> to be written to the client. This would allow you to handle a flush
> bucket, and the processing thread wouldn't stop processing the
> request, it juts wouldn't wait to continue processing if the data
> couldn't be written immediately. The point is to get the processing
> threads done with the request ASAP, so that they can handle the
> next request. The I/O for the threads can wait an extra 1/2 second
> or two.
>
> Think of it as three threads all working together.
>
> accept thread:
> Loop
> select
> accept
> hand-off request to worker
> end of loop
>
> worker thread:
> Loop
> receive request
> wait for data from I/O thread
> process request
> Loop
> Generate data
> hand-off data to I/O thread
> End of Loop
> End of Loop
>
> I/O thread:
> Loop
> if somebody's waiting for data
> Read from network
> hand-off data to worker
> end if
> if data to write
> receive data from worker
> write to network
> end if
> End of Loop
>
> Now, this can be improved even more by moving to a four
> thread model, where one thread is dedicated to reading from
> the network, and one is dedicated to writing to the network.
>
> The big trade-off with this async model is the sheer scalability. If the
> threads aren't waiting to actually read and write data, they are just
> processing requests, and for a small request, this can be done insanely
> quickly.
>
> Ryan
>
> > How would an async I/O MPM handle a flush bucket?
> >
> > What I'm missing is that you may not always migrate the thread
> > when doing an I/O because the non-I/O thread may still have
> > stuff to write.
> >
> > In Dean's descriptions of his ASH MPM (again, I may be missing
> > something), he talked about how the thread desiring I/O would
> > hand-off the request to the I/O thread and return to the idle
> > state - the assumption is that the request is complete. But,
> > making an I/O call doesn't mean that it is completed.
> >
> > Yet, that's the only clean way I can see how to handle this
> > migration - because I don't see a way to resume the exact thread
> > context in a clean way. (What I think we would need if we don't
> > have a single thread per request is to supply a post I/O
> > function that the I/O layer can indicate that a non-I/O thread
> > should call when I/O is complete - pure event-based programming.)
> >
> > I guess I'm thinking of something like a proxy server application
> > where we don't necessarily have all of the data up front - we may
> > only have snippets of the data (because the origin server hasn't
> > finished writing everything, but we have a partial response) - so
> > write them out and flush as we get data. This seems valid under
> > the 2.0 filter semantics. And, I'll argue that we must allow
> > modules to call for a flush. -- justin
>
> --
>
> ______________________________________________________________
> Ryan Bloom rbb@apache.org
> Covalent Technologies rbb@covalent.net
> --------------------------------------------------------------
--
Ian Holsman IanH@cnet.com
Performance Measurement & Analysis
CNET Networks - (415) 344-2608
Re: Async I/O question?
Posted by Ryan Bloom <rb...@covalent.net>.
On Wednesday 05 December 2001 12:27 pm, Justin Erenkrantz wrote:
two things.
1) In a full Async model, you would need to be able to recover
the threads context. That isn't possible without major
re-working of Apache, and that would be a good reason to move
on to Apache 3.0.
2) In a partial Async model, like what Dean is suggesting, the
I/O thread needs to be able to accept multiple chunks of data
to be written to the client. This would allow you to handle a flush
bucket, and the processing thread wouldn't stop processing the
request, it juts wouldn't wait to continue processing if the data
couldn't be written immediately. The point is to get the processing
threads done with the request ASAP, so that they can handle the
next request. The I/O for the threads can wait an extra 1/2 second
or two.
Think of it as three threads all working together.
accept thread:
Loop
select
accept
hand-off request to worker
end of loop
worker thread:
Loop
receive request
wait for data from I/O thread
process request
Loop
Generate data
hand-off data to I/O thread
End of Loop
End of Loop
I/O thread:
Loop
if somebody's waiting for data
Read from network
hand-off data to worker
end if
if data to write
receive data from worker
write to network
end if
End of Loop
Now, this can be improved even more by moving to a four
thread model, where one thread is dedicated to reading from
the network, and one is dedicated to writing to the network.
The big trade-off with this async model is the sheer scalability. If the
threads aren't waiting to actually read and write data, they are just
processing requests, and for a small request, this can be done insanely
quickly.
Ryan
> How would an async I/O MPM handle a flush bucket?
>
> What I'm missing is that you may not always migrate the thread
> when doing an I/O because the non-I/O thread may still have
> stuff to write.
>
> In Dean's descriptions of his ASH MPM (again, I may be missing
> something), he talked about how the thread desiring I/O would
> hand-off the request to the I/O thread and return to the idle
> state - the assumption is that the request is complete. But,
> making an I/O call doesn't mean that it is completed.
>
> Yet, that's the only clean way I can see how to handle this
> migration - because I don't see a way to resume the exact thread
> context in a clean way. (What I think we would need if we don't
> have a single thread per request is to supply a post I/O
> function that the I/O layer can indicate that a non-I/O thread
> should call when I/O is complete - pure event-based programming.)
>
> I guess I'm thinking of something like a proxy server application
> where we don't necessarily have all of the data up front - we may
> only have snippets of the data (because the origin server hasn't
> finished writing everything, but we have a partial response) - so
> write them out and flush as we get data. This seems valid under
> the 2.0 filter semantics. And, I'll argue that we must allow
> modules to call for a flush. -- justin
--
______________________________________________________________
Ryan Bloom rbb@apache.org
Covalent Technologies rbb@covalent.net
--------------------------------------------------------------