You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Jim Jagielski <ji...@jaguNET.com> on 2001/09/26 21:29:52 UTC

clean_child_exit, just_die and exit

PR 8001 reports that we are incorrectly using exit() in clean_child_exit
when it's called via just_die, which is a signal handler... I'm not
aware of exit() being a non-recommended option :)

Anyone else have more info on this...
-- 
===========================================================================
   Jim Jagielski   [|]   jim@jaguNET.com   [|]   http://www.jaguNET.com/
      "A society that will trade a little liberty for a little order
                   will lose both and deserve neither"

Re: clean_child_exit, just_die and exit

Posted by Bill Stoddard <bi...@wstoddard.com>.
We (and third party modules) call lots of stuff in clean_child_exit that is not safe to
call from a signal handler. Dean has pointed this out many times. We need to eliminate
this particular architectural flaw in 2.0.

BTW, Jeff is working on a patch to 1.3 that will help out some though I do not know all
the details.

Bill

----- Original Message -----
From: "Jim Jagielski" <ji...@jaguNET.com>
To: <de...@httpd.apache.org>
Sent: Wednesday, September 26, 2001 3:29 PM
Subject: clean_child_exit, just_die and exit


> PR 8001 reports that we are incorrectly using exit() in clean_child_exit
> when it's called via just_die, which is a signal handler... I'm not
> aware of exit() being a non-recommended option :)
>
> Anyone else have more info on this...
> --
> ===========================================================================
>    Jim Jagielski   [|]   jim@jaguNET.com   [|]   http://www.jaguNET.com/
>       "A society that will trade a little liberty for a little order
>                    will lose both and deserve neither"
>


Re: clean_child_exit, just_die and exit

Posted by Jeff Trawick <tr...@attglobal.net>.
dean gaudet <de...@arctic.org> writes:

> On 26 Sep 2001, Jeff Trawick wrote:
> 
> > @@ -2785,7 +2791,11 @@
> >  static void usr1_handler(int sig)
> >  {
> >      if (usr1_just_die) {
> > -	just_die(sig);
> > +        if (alarms_blocked) {
> > +            exit_after_unblock = 1;
> > +            return;
> > +        }
> > +        ap_longjmp(die_jmpbuffer, sig);
> >      }
> >      deferred_die = 1;
> >  }
> 
> longjmp() isn't safe either...
> 
> basically there's nothing in C that you can do safely in a signal handler
> except set a global variable.

I agree in principle, but reacting to the global variable in a
reasonable period of time (e.g., before grabbing another connection in
the case of graceful restart) may be impossible without changing our
expectations a bit.  Other than allowing the possibility that we serve
yet another request after SIGUSR1 is received, I don't see how to
handle the problem.

> longjmp() isn't safe for the same reasons atexit() and cleanups aren't
> safe in the presence of arbitrary 3rd party code:  the 3rd party code
> could be in a state that would require rollback or cleanup before it's
> safe to re-enter that 3rd party code.

There is a decent chance that the 3rd party code won't be re-entered
anyway due to our desire to exit the process as a part of handling the
signal.  This is just lowering the risk slightly :(

> when i last played with the 1.3 signal handlers i just chose performance
> over correctness in this case, i got tired of having to pay so many
> signal() syscalls to make the thing safe.  i was hoping signals would just
> go away in 2.0... but i've never helped make that happen.

The second best time to plant an oak tree is ... :)

> when it comes right down to it, you have to say fuck it at some point.
> either you die gracelessly, or the admin shoots you with a kill -9 at some
> point.  if graceless death is going to happen you might as well make it
> happen early rather than later.  (you've seen me argue recently that
> there's no sense trying to recover from out of memory errors, this is all
> part of that same argument.)

We can longjmp out of the signal handler and have to say "fuck it" in
a smaller set of situations.

I'm not sure it is possible to take the pure set-a-global approach
without revisiting the requirements for how certain signals are
received.

for sigusr1: 

  we could handle yet another connection

for sighup/sigterm:

  we could sprinkle the code with checks for the global variable to
  react fairly quickly or we could end up finishing the current
  request (if we don't get SIGKILLed by the parent first).

I dunno :(
-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...

Re: clean_child_exit, just_die and exit

Posted by dean gaudet <de...@arctic.org>.
On 26 Sep 2001, Jeff Trawick wrote:

> @@ -2785,7 +2791,11 @@
>  static void usr1_handler(int sig)
>  {
>      if (usr1_just_die) {
> -	just_die(sig);
> +        if (alarms_blocked) {
> +            exit_after_unblock = 1;
> +            return;
> +        }
> +        ap_longjmp(die_jmpbuffer, sig);
>      }
>      deferred_die = 1;
>  }

longjmp() isn't safe either...

basically there's nothing in C that you can do safely in a signal handler
except set a global variable.

longjmp() isn't safe for the same reasons atexit() and cleanups aren't
safe in the presence of arbitrary 3rd party code:  the 3rd party code
could be in a state that would require rollback or cleanup before it's
safe to re-enter that 3rd party code.

longjmp() unrolls the stack, but in C there's no destructors or anything
that are called to do the unrolling so the 3rd party code has no idea it
has occured and can't clean up.

atexit() and cleanups can call back into the 3rd party code while its data
structures are in an unsafe state.

when i last played with the 1.3 signal handlers i just chose performance
over correctness in this case, i got tired of having to pay so many
signal() syscalls to make the thing safe.  i was hoping signals would just
go away in 2.0... but i've never helped make that happen.

when it comes right down to it, you have to say fuck it at some point.
either you die gracelessly, or the admin shoots you with a kill -9 at some
point.  if graceless death is going to happen you might as well make it
happen early rather than later.  (you've seen me argue recently that
there's no sense trying to recover from out of memory errors, this is all
part of that same argument.)

in languages with exception mechanisms you can at least do things which
cause properly written 3rd party code to not run into troubles here --
because they have the chance to catch exceptions and unroll/correct their
state.  but i still doubt anyone would get this right.

-dean


Re: clean_child_exit, just_die and exit

Posted by Jeff Trawick <tr...@attglobal.net>.
Jim Jagielski <ji...@jaguNET.com> writes:

> PR 8001 reports that we are incorrectly using exit() in clean_child_exit
> when it's called via just_die, which is a signal handler... I'm not
> aware of exit() being a non-recommended option :)

Hmmm, never heard of exit() as a problem.  Maybe the point is that it
calls atexit()-registered routines, which can themselves do stuff that
doesn't work in a signal handler?  _exit() would avoid that, but then
maybe there is module-related stuff which really needs to be done.

What I have heard of lately (!!!uggh!!!), and what Bill alluded to in
his follow-up to your post, is that clean_child_exit() from a signal
handler, and the calling of child-exit, can cause problems with
certain third-party modules which do stuff which, at least on certain
platforms, doesn't work in a signal handler.  (In other words, I don't
know what the &^%^$ the module is doing but when I gave them a patch
which no longer did clean_child_exit() from the SIGUSR1 handler,
segfaults during idle server maintenance-initiated SIGUSR1s went
away.)

This patch is a 1st pass at avoiding those issues.  There are a couple
of bogus log entries for testing purposes :)

I'll try to have a look-see at PR 8001 and see what that is.

Index: src/main/http_main.c
===================================================================
RCS file: /home/cvs/apache-1.3/src/main/http_main.c,v
retrieving revision 1.535
diff -u -r1.535 http_main.c
--- src/main/http_main.c	2001/04/12 17:49:26	1.535
+++ src/main/http_main.c	2001/09/25 15:37:46
@@ -301,6 +301,7 @@
 static server_rec *server_conf;
 #ifndef NETWARE
 static JMP_BUF APACHE_TLS jmpbuffer;
+static JMP_BUF APACHE_TLS die_jmpbuffer;
 #endif
 static int sd;
 static fd_set listenfds;
@@ -486,6 +487,8 @@
 static void clean_child_exit(int code) __attribute__ ((noreturn));
 static void clean_child_exit(int code)
 {
+    ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, NULL,
+                 "clean_child_exit(%d)", code);
     if (pchild) {
 	ap_child_exit_modules(pchild, server_conf);
 	ap_destroy_pool(pchild);
@@ -2775,7 +2778,10 @@
 	exit_after_unblock = 1;
     }
     else {
-	clean_child_exit(0);
+        if (one_process) {
+            clean_child_exit(0);
+        }
+        ap_longjmp(die_jmpbuffer, sig);
     }
 }
 
@@ -2785,7 +2791,11 @@
 static void usr1_handler(int sig)
 {
     if (usr1_just_die) {
-	just_die(sig);
+        if (alarms_blocked) {
+            exit_after_unblock = 1;
+            return;
+        }
+        ap_longjmp(die_jmpbuffer, sig);
     }
     deferred_die = 1;
 }
@@ -3996,6 +4006,8 @@
 		    break;
 		if (deferred_die) {
 		    /* we didn't get a socket, and we were told to die */
+                    ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, server_conf,
+                                 "deferred_die (%d)",__LINE__);
 		    clean_child_exit(0);
 		}
 	    }
@@ -4099,6 +4111,8 @@
 	    usr1_just_die = 1;
 	    if (deferred_die) {
 		/* ok maybe not, see ya later */
+                ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_ERR, server_conf,
+                             "deferred_die");
 		clean_child_exit(0);
 	    }
 	    /* or maybe we missed a signal, you never know on systems
@@ -4297,7 +4311,7 @@
 
 static int make_child(server_rec *s, int slot, time_t now)
 {
-    int pid;
+    int pid, sig;
 
     if (slot + 1 > max_daemons_limit) {
 	max_daemons_limit = slot + 1;
@@ -4362,6 +4376,13 @@
 	 * Note that since restart() just notes that a restart has been
 	 * requested there's no race condition here.
 	 */
+
+        if ((sig = ap_setjmp(die_jmpbuffer)) != 0) {
+            ap_log_error(APLOG_MARK, APLOG_ERR|APLOG_NOERRNO, server_conf,
+                         "just jumped");
+            clean_child_exit(0);
+        }
+
 	signal(SIGHUP, just_die);
 	signal(SIGUSR1, just_die);
 	signal(SIGTERM, just_die);

-- 
Jeff Trawick | trawick@attglobal.net | PGP public key at web site:
       http://www.geocities.com/SiliconValley/Park/9289/
             Born in Roswell... married an alien...