You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Graham Leggett <mi...@sharp.fm> on 1999/05/28 15:28:54 UTC

More bug tracking - httpd clients refuse to die

Hi all,

I am trying to investigate why the httpd clients I am running get stuck
and refuse to gracefully restart.

Connecting gdb to such a process results in the following backtrace:

(gdb) bt
#0  0xef5b8598 in _read ()
#1  0xef1d5aa4 in _ti_read ()
#2  0x1e840 in buff_read (fb=0x84650, buf=0x84690, nbyte=4096) at
buff.c:280
#3  0x1e7b0 in saferead_guts (fb=0x84650, buf=0x84690, nbyte=4096) at
buff.c:623
#4  0x1c818 in read_with_errors (fb=0x84650, buf=0x84690, nbyte=4096) at
buff.c:674
#5  0x1cce4 in ap_bgets (buff=0xefffda58 "", n=8192, fb=0x84650) at
buff.c:826
#6  0x32780 in getline (s=0xefffda58 "", n=8192, in=0x84650, fold=0) at
http_protocol.c:671
#7  0x32c20 in read_request_line (r=0xc3568) at http_protocol.c:791
#8  0x335d8 in ap_read_request (conn=0x2a6e28) at http_protocol.c:949
#9  0x2f1d0 in child_main (child_num_arg=10) at http_main.c:3968
#10 0x2f57c in make_child (s=0x6e728, slot=10, now=927890857) at
http_main.c:4138
#11 0x2fa90 in perform_idle_server_maintenance () at http_main.c:4302
#12 0x301f0 in standalone_main (argc=1, argv=0xeffffed4) at
http_main.c:4533
#13 0x30acc in main (argc=1, argv=0xeffffed4) at http_main.c:4769

Inside read_request_line() there is a small piece of code that ignores
the USR1 signal like so:

    /* we've probably got something to do, ignore graceful restart
requests */
#ifdef SIGUSR1
    signal(SIGUSR1, SIG_IGN);
#endif

Two questions:

Why would Apache want to ignore the USR1 signal? Surely it would want to
defer it until later?

>From the above backtrace it looks like the client is stuck trying to
read data from a socket, but there is no client on the other side (as
far as I know) and this connection does not seem to time out ever. Is
this how httpd is supposed to behave?

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...

Re: More bug tracking - httpd clients refuse to die

Posted by Graham Leggett <mi...@sharp.fm>.
Dean Gaudet wrote:

> > Why would Apache want to ignore the USR1 signal? Surely it would want to
> > defer it until later?
> 
> It is deferred, it's graceful -- there's a note in the scoreboard
> indicating the child should gracefully stop.  We ignore the signal because
> we can't stop there -- we need to serve at least one request on every
> connection.

Ok, then it would seem that these "stuck" processes are sitting around
waiting to be given "something to do", but then never actually get given
anything to do. So they sit around forever ignoring graceful restart
requests because they want to server one last request (which never
comes).

These processes *also* ignore the restart signal, and the parent
eventually terminates them with a SIGKILL - but not always - after a
number of restarts (once an hour to rotate logs in our case) the
processes are no longer terminated, and sit around until Max Clients is
reached, then Apache starts leaking memory everywhere and kills the box.

> My wild guess is that you've got a third party module in there which has
> futzed with SIGALRM and not restored the signal handler.

This behavior only started with v1.3.7, and happens with or without
external modules present. I am using the mod_proxy module a lot for
reverse proxy requests, this may have something to do with it as this
module isn't that mainstream, but I'm not sure.

Regards,
Graham
-- 
-----------------------------------------
minfrin@sharp.fm		"There's a moon
					over Bourbon Street
						tonight...

Re: More bug tracking - httpd clients refuse to die

Posted by Dean Gaudet <dg...@arctic.org>.
On Fri, 28 May 1999, Graham Leggett wrote:

> #0  0xef5b8598 in _read ()

blocked reading on the client

> #1  0xef1d5aa4 in _ti_read ()
> #2  0x1e840 in buff_read (fb=0x84650, buf=0x84690, nbyte=4096) at
> buff.c:280
> #3  0x1e7b0 in saferead_guts (fb=0x84650, buf=0x84690, nbyte=4096) at
> buff.c:623
> #4  0x1c818 in read_with_errors (fb=0x84650, buf=0x84690, nbyte=4096) at
> buff.c:674
> #5  0x1cce4 in ap_bgets (buff=0xefffda58 "", n=8192, fb=0x84650) at
> buff.c:826
> #6  0x32780 in getline (s=0xefffda58 "", n=8192, in=0x84650, fold=0) at
> http_protocol.c:671
> #7  0x32c20 in read_request_line (r=0xc3568) at http_protocol.c:791

reading the "GET /foo HTTP/x.y" line

> #8  0x335d8 in ap_read_request (conn=0x2a6e28) at http_protocol.c:949
> #9  0x2f1d0 in child_main (child_num_arg=10) at http_main.c:3968
> #10 0x2f57c in make_child (s=0x6e728, slot=10, now=927890857) at
> http_main.c:4138
> #11 0x2fa90 in perform_idle_server_maintenance () at http_main.c:4302
> #12 0x301f0 in standalone_main (argc=1, argv=0xeffffed4) at
> http_main.c:4533
> #13 0x30acc in main (argc=1, argv=0xeffffed4) at http_main.c:4769
> 
> Inside read_request_line() there is a small piece of code that ignores
> the USR1 signal like so:
> 
>     /* we've probably got something to do, ignore graceful restart
> requests */
> #ifdef SIGUSR1
>     signal(SIGUSR1, SIG_IGN);
> #endif
> 
> Two questions:
> 
> Why would Apache want to ignore the USR1 signal? Surely it would want to
> defer it until later?

It is deferred, it's graceful -- there's a note in the scoreboard
indicating the child should gracefully stop.  We ignore the signal because
we can't stop there -- we need to serve at least one request on every
connection. 

My wild guess is that you've got a third party module in there which has
futzed with SIGALRM and not restored the signal handler. 

BTW, there's another bug in the bug database, similar problem... with the
parent doing all the alarm stuff these days (on any sane unix) we can
totally protect ourselves from broken 3rd party modules that screw with
alarm() or sleep() by using SIGUSR2 instead of SIGALRM... the patch is
trivial.  But it's not portable to broken/old unixes, so I haven't done
it. 

Another feature someone could add is a check to the child main loop that
the SIGALRM handler is set up properly... just for debugging -- it
shouldn't be on by default. 

Dean