You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1999/06/28 17:31:14 UTC

Re: async routines

[hope you don't mind me cc'ing new-httpd zach, I think others will be
interested.]

On Mon, 28 Jun 1999, Zach Brown wrote:

> so dean, I was wading through the mpm code to see if I could munge the
> sigwait stuff into it.
> 
> as far as I could tell, the http protocol routines are still blocking.
> what does the future hold in the way for async routines? :)  I basically
> need a way to do something like..

You're still waiting for me to get the async stuff in there... I've done
part of the work -- the BUFF layer now supports non-blocking sockets. 

However, the HTTP code will always remain blocking.  There's no way I'm
going to try to educate the world in how to write async code... and since
our HTTP code has arbitrary call outs to third party modules... It'd
have a drastic effect on everyone to make this change.

But I honestly don't think this is a problem.  Here's my observations:

All the popular HTTP clients send their requests in one packet (or two
in the case of a POST and netscape).  So the HTTP code would almost
never have to block while processing the request.  It may block while
processing a POST -- something which someone else can worry about later,
my code won't be any worse than what we already have in apache.  So
any effort we put into making the HTTP parsing code async-safe would
be wasted on the 99.9% case.

Most responses fit in the socket's send buffer, and again don't require
async support.  But we currently do the lingering_close() routine which
could easily use async support.  Large responses also could use async
support.

The goal of HTTP parsing is to figure out which response object to
send.  In most cases we can reduce that to a bunch of common response
types:

- copying a file to the socket
- copying a pipe/socket to the socket  (IPC, CGIs)
- copying a mem region to the socket (mmap, some dynamic responses)

So what we do is we modify the response handlers only.  We teach them
about how to send async responses.

There will be a few new primitives which will tell the core "the response
fits one of these categories, please handle it".  The core will do the
rest -- and for MPMs which support async handling, the core will return
to the MPM and let the MPM do the work async...  the MPM will call a
completion function supplied by the core.  (Note that this will simplify
things for lots of folks... for example, it'll let us move range request
handling to a common spot so that more than just default_handler
can support it.)

I expect this to be a simple message passing protocol (pass by reference).
Well rather, that's how I expect to implement it in ASH -- where I'll
have a single thread per-process doing the select/poll stuff; and the
other threads are in a pool that handles the protocol stuff.  For your
stuff you may want to do it another way -- but we'll be using a common
structure that the core knows about... and that structure will look like
a message:

    struct msg {
	enum {
	    MSG_SEND_FILE,
	    MSG_SEND_PIPE,
	    MSG_SEND_MEM,
	    MSG_LINGERING_CLOSE,
	    MSG_WAIT_FOR_READ,	/* for handling keep-alives */
	    ...
	} type;
	BUFF *client;
	void (*completion)(struct msg *, int status);
	union {
	    ... extra data here for whichver types need it ...;
	} x;
    };

The nice thing about this is that these operations are protocol
independant... at this level there's no knowledge of HTTP, so the same
MPM core could be used to implement other protocols.

> so as I was thinking about this stuff, I realized it might be neat to have
> 'classes' of non blocking pending work and have different threads with
> differnt priorities hacking on it.  Say we have a very high priority
> thread that accepts connectoins, does initial header parsing, and
> sendfile()ing data out.  We could have lower priority threads that are
> spinning doing 'harder' BUFF work like an encryption layer or gziping
> content, whatever.

You should be able to implement this in your MPM easily I think... because
you'll see the different message types and can distribute them as needed.

Dean


Re: async routines

Posted by Zach Brown <za...@zabbo.net>.
> [hope you don't mind me cc'ing new-httpd zach, I think others will be
> interested.]

oh, no problem at all.  the more the merrier :)

my apologies if some of this sounds dumb, I'm not terribly familiar with
the apache source yet :(

> You're still waiting for me to get the async stuff in there... I've done
> part of the work -- the BUFF layer now supports non-blocking sockets. 

could I abuse things such that I could read in a BUFF asyncly and then
mend it to a connection and pass it down the protocol layers?  If they
could return "hey dork, that buff isn't a complete request" I could then
hand it off to threads to deal with it.. 

say I set a bit in the connection struct that says "return an 'incomplete'
error rather than aborting or completing.." so that I can then clear this
bit and hand it off to a thread that will follow the usual blocking paths.

> However, the HTTP code will always remain blocking.  There's no way I'm
> going to try to educate the world in how to write async code... and since
> our HTTP code has arbitrary call outs to third party modules... It'd
> have a drastic effect on everyone to make this change.

oh, I understand that completely.  All I need is an async ability to parse
a request.. even if I have to jump through hoops to get at it.

I _certainly_ don't expect all module/cgi writers to now adapt their code
paths to understand async checkpointing and all that snot ;)

> All the popular HTTP clients send their requests in one packet (or two
> in the case of a POST and netscape).  So the HTTP code would almost

Agreed, certainly.

but my argument here is that supporting the off chance that we will block
while parsing the requests renders useless the nice nice optimizations we
could have done if we knew the initial header parsing wasn't going to
block on us.

with the sigio/siginfo/sigwaitinfo() model I can do the initial parsing of
an insane amount of incoming connections with very very little overhead.
I even get accept() balancing of an arbitrary number of listening sockets
across N of these sigio/siginfo spinning threads basically for free.

but all of that goes out the window if the request parsing could block.
all someone needs to do is hold a port open without sending data and the
whole thread is dead to the world ;)  one could put timeouts and such in,
but the idea of trying to break the protocol code while stuck in read()..
you get the idea :)

And as you quite rightly point out, the vast majority of responses fit in
the tcp send buffers anyway.  If I'm forced to have a thread that can
block on the initial request parsing, it would be silly to have its i/o
fall back to the sigio/blah i/o engine if most of them fit into the send
buffer.

the beauty of the sigio worker model is that you can handle all those
'fast path' request->file connections without the overhead of thread
management, scheduling, etc, etc.

If we can work a way to read in an initial BUFF asyncly and be able to
pass it to a 'verification' mode of the protocol parsing that will return
one of the request 'types' below or throw back an incomplete error, I
think thats all we need to see static serving to through the roof.

> So what we do is we modify the response handlers only.  We teach them
> about how to send async responses.

I'm all for this bit as well :)

> I expect this to be a simple message passing protocol (pass by reference).
> Well rather, that's how I expect to implement it in ASH -- where I'll
> have a single thread per-process doing the select/poll stuff; and the
> other threads are in a pool that handles the protocol stuff.  For your
> stuff you may want to do it another way -- but we'll be using a common

the sigio/siginfo stuff is only really a low overhead nonblocking fd event
engine, same as select()/poll() for most design concerns.  so yeah, I'd
have the initial accepting poll handle the request and pass the async
finisher functions to a bunch of sigio/siginfo threads that deal with
muxing the non blocking responses..

>     struct msg {

might a 

	void *mpm_priv;

be handy?

> The nice thing about this is that these operations are protocol
> independant... at this level there's no knowledge of HTTP, so the same
> MPM core could be used to implement other protocols.

I definitely like that bit.

> You should be able to implement this in your MPM easily I think... because
> you'll see the different message types and can distribute them as needed.

yup, its a simple matter of F_SETOWN in my case :)

-- zach

- - - - - -
007 373 5963