You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Ben Hyde <bh...@pobox.com> on 1999/01/14 18:05:41 UTC

Pools and Threads and Errors

In Apache 2.0 we have discussed having a layer, NPR like, that will
provide threads and other os abstractions.  Call that APR for the
purposes of this note. I assume this will be kind of a giant evolved
out of the os/ directory.  

I assume that we will move the pools abstraction into and low down in
this facility.  That would resolve a number of irritating uses of
malloc found in the os directory and free up a lot of other chaff
that ought to have moved out of the main/ directory as general
utilities a long while ago.

We have discussed before (a year or more) that every thread ought
to have it's own pool.  I assume that the pool of a thread will be
our principle tool for cleaning up when a thread is destroyed.

We have discussed before that the operation set of a thread can not
include an interrupt (i.e. the analogous operation to Unix signals).
The only way to get a thread's attention is to destroy it or hope
that it mets up with you somehow.  I assume this implies that timeout
will be done by destroying the threads.

Am I delusional yet?

 - ben

Re: Pools and Threads and Errors

Posted by Ben Hyde <bh...@pobox.com>.

Bill Stoddard writes:
>Ben Hyde wrote:
>> 
>
>> We have discussed before (a year or more) that every thread ought
>> to have it's own pool.  I assume that the pool of a thread will be
>> our principle tool for cleaning up when a thread is destroyed.
>In a threaded Apache, should the number of threads per process be static
>(configurable) or should it vary dynamically based on server load? I
>would envision starting a process with n threads. In the pthreads port,
>each thread has it's own thread local storage (allocated out of a pool)
>that is cleared/reinitialized at the beginning of each new request. The
>pool is cleaned up when the process dies. The number of threads per
>process remains static.

All multithreaded applications I've built end up with far more
threads than anybody expected them to have.  I'd assume modules will
create threads that request handling threads sometimes serialize
against.

The A1 model is one process manager overseeing N worker processes.
That has, among other things, a very high reliability since it can
limp along when individual work processes lockup or commit suicide.
That feature alone makes it worth preserving.

Meanwhile the A2 model needs.  Something analogous at the thread
level, e.g. a single thread manager overseeing N working threads.
That's necessary to assure the (pool) cleanup of threads you are
forced to destroy gets executed as well as to do the other things
the A1 process manager does, blackboard, thread pool sizing, etc.

 - ben

Re: Pools and Threads and Errors

Posted by Dean Gaudet <dg...@arctic.org>.

On Thu, 14 Jan 1999, Bill Stoddard wrote:

> Here is a related (in my mind at least :-) question... In the NSPR port
> and the pthreads port, all the threads block on an accept_loop mutex.
> One thread is allowed to call the accept(). When a request comes in, the
> thread blocking in the accept processes the request. Apache 1.3 for NT
> has a single thread doing an accept(). When a request comes in, queue
> element is built an queued to pool of worker threads. Furthermore, each
> request thread checks (via calls to WaitForMultipleObjects) to see if
> any of the other threads has signaled that it is exiting. Seems the NSPR
> and pthread port method is much more efficient model. Why is it
> necessary, in the Win32 port, to check for threads exiting?

Not sure what the win32 port is doing... but the "best" method of mapping
incoming connections to threads depends a lot on how the threads are
implemented (userland vs. kernel vs. hybrid), how many processors there
are, how many net cards, yadda.  In the NSPR port I did just what worked
well on the linux version of NSPR -- which used userland threads at the
time... now it uses kernel threads I think, which would make it somewhat
more efficient to drop multiple threads into accept() at the same time.

Dean

Re: Pools and Threads and Errors

Posted by Bill Stoddard <st...@raleigh.ibm.com>.

Ben Hyde wrote:
> 

> We have discussed before (a year or more) that every thread ought
> to have it's own pool.  I assume that the pool of a thread will be
> our principle tool for cleaning up when a thread is destroyed.
In a threaded Apache, should the number of threads per process be static
(configurable) or should it vary dynamically based on server load? I
would envision starting a process with n threads. In the pthreads port,
each thread has it's own thread local storage (allocated out of a pool)
that is cleared/reinitialized at the beginning of each new request. The
pool is cleaned up when the process dies. The number of threads per
process remains static.

Here is a related (in my mind at least :-) question... In the NSPR port
and the pthreads port, all the threads block on an accept_loop mutex.
One thread is allowed to call the accept(). When a request comes in, the
thread blocking in the accept processes the request. Apache 1.3 for NT
has a single thread doing an accept(). When a request comes in, queue
element is built an queued to pool of worker threads. Furthermore, each
request thread checks (via calls to WaitForMultipleObjects) to see if
any of the other threads has signaled that it is exiting. Seems the NSPR
and pthread port method is much more efficient model. Why is it
necessary, in the Win32 port, to check for threads exiting?

-- 
Bill Stoddard
stoddard@raleigh.ibm.com

Re: I/O timeouts in A2 (was Re: Pools and Threads and Errors)

Posted by Ben Hyde <bh...@pobox.com>.

I yearn for a general "error occurred callback" scheme along these
lines.  Consider for example the use of how ap_log_error is used in
src/os/unix - it ought to stand on something low level rather than
something in main/.  I lean toward moving alloc.c, some of http_log.c,
and the stack/thread management abstractions into something place
lower level than main/.  They are more like language extensions.

If we not going to allow the destruction of threads than a cleaner
exception/logging/unwinding scheme would be nice.

 - ben

Dean Gaudet writes:
>On Thu, 14 Jan 1999, Ben Hyde wrote:
>> Dean Gaudet writes:
>> >It calls the previously unused error handler for the BUFF, which gives the
>> >user a chance to do something special.  I forget exactly what I did. 
>> 
>> Unused in the core code possibly.
>
>Yeah it doesn't really affect others I don't think.  I think I just
>cleared up the semantics.  Maybe.
>
>> Could not timing out the I/O be handled much as losing the connection,
>> except for possibly the log message?
>
>/* XXX: this should really take a request_rec as the data, so that info
> * about the pathname causing the error can be displayed.  But to do that
> * the child_main loop needs to be careful about calling ap_bonerror()
> * after it destroys the request.  Deal with that later.
> */
>static void client_bonerror_handler(BUFF *fb, int direction, void *vb)
>{
>    conn_rec *conn;
>    const char *errstr;
>
>    switch (PR_GetError()) {
>    case PR_IO_TIMEOUT_ERROR:
>        errstr = "client %s timed out during %s (%s)";
>        break;
>
>    case PR_CONNECT_RESET_ERROR:
>        errstr = "client %s stopped connection before %s (%s) completed";
>        break;
>
>    default:
>        errstr = "client %s had an i/o error during %s (%s)";
>        break;
>    }
>    conn = vb;
>    ap_log_error(APLOG_MARK, APLOG_INFO, conn->server, errstr,
>                conn->remote_ip,
>                conn->timeout_name ? conn->timeout_name : "request",
>                (direction == B_RD) ? "read" : "write");
>}
>
>... and in ap_read_request we have:
>
>    ap_bonerror(conn->client, client_bonerror_handler, conn);
>    conn->timeout_name = "read request line";
>    ap_bsetopt(conn->client, BO_TIMEOUT,
>        conn->keptalive
>            ? &r->server->keep_alive_timeout_interval
>            : &r->server->timeout_interval);
>
>folks can play with conn->timeout_name to change what's logged.
>
>Dean
>
>

Re: I/O timeouts in A2 (was Re: Pools and Threads and Errors)

Posted by Dean Gaudet <dg...@arctic.org>.


On Thu, 14 Jan 1999, Ben Hyde wrote:

> Dean Gaudet writes:
> >It calls the previously unused error handler for the BUFF, which gives the
> >user a chance to do something special.  I forget exactly what I did. 
> 
> Unused in the core code possibly.

Yeah it doesn't really affect others I don't think.  I think I just
cleared up the semantics.  Maybe.

> Could not timing out the I/O be handled much as losing the connection,
> except for possibly the log message?

/* XXX: this should really take a request_rec as the data, so that info
 * about the pathname causing the error can be displayed.  But to do that
 * the child_main loop needs to be careful about calling ap_bonerror()
 * after it destroys the request.  Deal with that later.
 */
static void client_bonerror_handler(BUFF *fb, int direction, void *vb)
{
    conn_rec *conn;
    const char *errstr;

    switch (PR_GetError()) {
    case PR_IO_TIMEOUT_ERROR:
        errstr = "client %s timed out during %s (%s)";
        break;

    case PR_CONNECT_RESET_ERROR:
        errstr = "client %s stopped connection before %s (%s) completed";
        break;

    default:
        errstr = "client %s had an i/o error during %s (%s)";
        break;
    }
    conn = vb;
    ap_log_error(APLOG_MARK, APLOG_INFO, conn->server, errstr,
                conn->remote_ip,
                conn->timeout_name ? conn->timeout_name : "request",
                (direction == B_RD) ? "read" : "write");
}

... and in ap_read_request we have:

    ap_bonerror(conn->client, client_bonerror_handler, conn);
    conn->timeout_name = "read request line";
    ap_bsetopt(conn->client, BO_TIMEOUT,
        conn->keptalive
            ? &r->server->keep_alive_timeout_interval
            : &r->server->timeout_interval);

folks can play with conn->timeout_name to change what's logged.

Dean

I/O timeouts in A2 (was Re: Pools and Threads and Errors)

Posted by Ben Hyde <bh...@pobox.com>.

Dean Gaudet writes:
>> >If you look at how we currently use timeouts, we use them to get back from
>> >network/pipe reads and writes.  That's pretty easy to implement via select
>> >under unix (and I know it can be done under NT, I just don't know the
>> >mechanism -- NSPR implements it).  The NSPR port adds a timeout parameter
>> >to BUFF, and all reads or writes on that BUFF respect the timeout.
>> 
>> Yes all I/O operations should time out.  Presumably when BUFF get that
>> a timeout it marks the BUFF as dead in some sense?
>
>It calls the previously unused error handler for the BUFF, which gives the
>user a chance to do something special.  I forget exactly what I did. 

Unused in the core code possibly.

Could not timing out the I/O be handled much as losing the connection,
except for possibly the log message?

 - ben

Re: Pools and Threads and Errors

Posted by Ben Hyde <bh...@pobox.com>.

Dean Gaudet writes:
>On Thu, 14 Jan 1999, Ben Hyde wrote:
>> TerminateThread is this operation on NT.  The manual does warn that it
>> is a dangerous (duh).  It reports you have to do the cleanups your self.
>> including those for code you didn't write (i.e. DLLs including kernal32).
>
>i.e. not worth considering :)

An OS is built of shared DLLs that can not handle the murder/suicide
of a few processes and threads then they are in pretty deep ain't
they.

>> The timeout I'm concerned about is the one were a module callback goes 
>> out to lunch because they will.  The lack of designing for this case
>> in NT programs seems to me to explain a lot about why they are forever
>> going out to lunches from which they never to return.
>
>Yeah but there's no portable solution to this problem that I'm aware of. 

OK, I guess I'm drifting back toward the never EVER destroy a thread
model.  I sure have a hell of a time staying in that mind set.

>Even on unix in single threaded apache we can't solve this problem.  We
>can't take a SIGALRM in third-party code, because it's frequently not
>ready to see EINTR or anything else like that.  It's certainly not ready
>for a longjmp().  Even though we do this, we just get lucky because we
>only lose one process when it messes up.
>
>If a site has such a braindead module then they can build their apache
>without threads (unless they happen to be so unfortunate as to be using
>NT).  

Blame the victim eh?  Module lock up because the hang on some down
stream resource as much as on because they get to contemplating their
navel.

>I've always been advocating that we continue to support the
>multiprocess model.

We are allowed to destroy processes right? :-).

If so we still have to design enough stuff to allow things to unwind
during that destruction - happy day all the work none of the noble
consequences (like a real error handling scheme).

>> I'm shopping for a consensus that we do or don't EVER destroy threads,
>> either outcome leads to consequences.
>
>We don't, we can't do it if we want to remain portable. 

I'm not convinced it has anything to do with "portable" but rather
it has to do with the how hopeless interrupting/unwinding a 3rd
party piece of code is.

>I'd like to be corrected if I'm wrong though. 

Well there is a split the difference design that says if you
want to leverage 3rd party code you have to warranty that you've
got the unwinding under control, otherwise we nuke the entire
process for your own "good" if you run over budget.

 - ben

Re: Pools and Threads and Errors

Posted by Dean Gaudet <dg...@arctic.org>.

On Thu, 14 Jan 1999, Ben Hyde wrote:

> TerminateThread is this operation on NT.  The manual does warn that it
> is a dangerous (duh).  It reports you have to do the cleanups your self.
> including those for code you didn't write (i.e. DLLs including kernal32).

i.e. not worth considering :)

> >If you look at how we currently use timeouts, we use them to get back from
> >network/pipe reads and writes.  That's pretty easy to implement via select
> >under unix (and I know it can be done under NT, I just don't know the
> >mechanism -- NSPR implements it).  The NSPR port adds a timeout parameter
> >to BUFF, and all reads or writes on that BUFF respect the timeout.
> 
> Yes all I/O operations should time out.  Presumably when BUFF get that
> a timeout it marks the BUFF as dead in some sense?

It calls the previously unused error handler for the BUFF, which gives the
user a chance to do something special.  I forget exactly what I did. 

> The timeout I'm concerned about is the one were a module callback goes 
> out to lunch because they will.  The lack of designing for this case
> in NT programs seems to me to explain a lot about why they are forever
> going out to lunches from which they never to return.

Yeah but there's no portable solution to this problem that I'm aware of. 

Even on unix in single threaded apache we can't solve this problem.  We
can't take a SIGALRM in third-party code, because it's frequently not
ready to see EINTR or anything else like that.  It's certainly not ready
for a longjmp().  Even though we do this, we just get lucky because we
only lose one process when it messes up.

If a site has such a braindead module then they can build their apache
without threads (unless they happen to be so unfortunate as to be using
NT).  I've always been advocating that we continue to support the
multiprocess model. 

> I'm shopping for a consensus that we do or don't EVER destroy threads,
> either outcome leads to consequences.

We don't, we can't do it if we want to remain portable. 

I'd like to be corrected if I'm wrong though. 

Dean

Re: Pools and Threads and Errors

Posted by Ben Hyde <bh...@pobox.com>.

Dean Gaudet writes:
>> We have discussed before that the operation set of a thread can not
>> include an interrupt (i.e. the analogous operation to Unix signals).
>> The only way to get a thread's attention is to destroy it or hope
>> that it mets up with you somehow.  I assume this implies that timeout
>> will be done by destroying the threads.
>
>There's no asynchronous anything, not even destruction.  AFAIK NT doesn't
>support that... and even if it did, it's a mess.

TerminateThread is this operation on NT.  The manual does warn that it
is a dangerous (duh).  It reports you have to do the cleanups your self.
including those for code you didn't write (i.e. DLLs including kernal32).

>If you look at how we currently use timeouts, we use them to get back from
>network/pipe reads and writes.  That's pretty easy to implement via select
>under unix (and I know it can be done under NT, I just don't know the
>mechanism -- NSPR implements it).  The NSPR port adds a timeout parameter
>to BUFF, and all reads or writes on that BUFF respect the timeout.

Yes all I/O operations should time out.  Presumably when BUFF get that
a timeout it marks the BUFF as dead in some sense?

The timeout I'm concerned about is the one were a module callback goes 
out to lunch because they will.  The lack of designing for this case
in NT programs seems to me to explain a lot about why they are forever
going out to lunches from which they never to return.

I'm shopping for a consensus that we do or don't EVER destroy threads,
either outcome leads to consequences.

 - ben

Re: Pools and Threads and Errors

Posted by Dean Gaudet <dg...@arctic.org>.

On 14 Jan 1999, Ben Hyde wrote:

> I assume that we will move the pools abstraction into and low down in
> this facility.  That would resolve a number of irritating uses of
> malloc found in the os directory and free up a lot of other chaff
> that ought to have moved out of the main/ directory as general
> utilities a long while ago.

Yup.

> We have discussed before (a year or more) that every thread ought
> to have it's own pool.  I assume that the pool of a thread will be
> our principle tool for cleaning up when a thread is destroyed.

Yup.

> We have discussed before that the operation set of a thread can not
> include an interrupt (i.e. the analogous operation to Unix signals).
> The only way to get a thread's attention is to destroy it or hope
> that it mets up with you somehow.  I assume this implies that timeout
> will be done by destroying the threads.

There's no asynchronous anything, not even destruction.  AFAIK NT doesn't
support that... and even if it did, it's a mess.

If you look at how we currently use timeouts, we use them to get back from
network/pipe reads and writes.  That's pretty easy to implement via select
under unix (and I know it can be done under NT, I just don't know the
mechanism -- NSPR implements it).  The NSPR port adds a timeout parameter
to BUFF, and all reads or writes on that BUFF respect the timeout.

Dean