You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@httpd.apache.org by Dean Gaudet <dg...@arctic.org> on 1998/03/10 11:13:16 UTC

OPTIMIZE_TIMEOUTS

While it's really nice, it's also completely broken.  There are race
conditions surrounding kill_timeouts() -- the parent might send a SIGALRM
after the kill_timeouts() returns. 

The fix for OPTIMIZE_TIMEOUTS is to use sigprocmask() to block SIGALRM in
kill_timeouts() and reinstate it in (hard|soft)_timeouts, but still use
the parent for the SIGARLM source.  This is still fewer syscalls than
before... but annoying nonetheless.  (Oh I could fix this without the
syscall if I had any sort of atomic primitives...) 

This is also only the beginning of the problems we can have with
SIGALRM... particularly with respect to third party libraries.  We have to
assume that all library code is async signal unsafe -- except for the
small subset of libc which is defined to be async signal safe.  So, for
example, if you take a SIGALRM in the middle of doing a database lookup,
and then longjmp and call your cleanup routine which mucks with the
database... you're screwed.

I've only looked at linux libc (libc5 and glibc2), and berkeley db-1.85 so
far... and they both have problems with EINTRs at the wrong time.  I have
no reason to believe that this is a problem specific to these three
libraries -- in fact because signals are tough to get right I suspect that
this is a general problem in all libraries. 

I've no solution to this.  But this would explain random failures that
occur on high traffic sites. 

Oh and the comment I just stuffed into mod_include... that explains core
dumps on sites using mod_include extensively.  It's bogus to set a
hard_timeout and then subject essentially arbitrary code to it --
hard_timeout should be used locally in places where all the code is
controlled.

Dean

[PATCH] OPTIMIZE_TIMEOUTS

Posted by Dean Gaudet <dg...@arctic.org>.

I babbled this a while back:

On Tue, 10 Mar 1998, Dean Gaudet wrote:

> While it's really nice, it's also completely broken.  There are race
> conditions surrounding kill_timeouts() -- the parent might send a SIGALRM
> after the kill_timeouts() returns. 
> 
> The fix for OPTIMIZE_TIMEOUTS is to use sigprocmask() to block SIGALRM in
> kill_timeouts() and reinstate it in (hard|soft)_timeouts, but still use
> the parent for the SIGARLM source.  This is still fewer syscalls than
> before... but annoying nonetheless.  (Oh I could fix this without the
> syscall if I had any sort of atomic primitives...) 
> 
> This is also only the beginning of the problems we can have with
> SIGALRM... particularly with respect to third party libraries.  We have to
> assume that all library code is async signal unsafe -- except for the
> small subset of libc which is defined to be async signal safe.  So, for
> example, if you take a SIGALRM in the middle of doing a database lookup,
> and then longjmp and call your cleanup routine which mucks with the
> database... you're screwed.
> 
> I've only looked at linux libc (libc5 and glibc2), and berkeley db-1.85 so
> far... and they both have problems with EINTRs at the wrong time.  I have
> no reason to believe that this is a problem specific to these three
> libraries -- in fact because signals are tough to get right I suspect that
> this is a general problem in all libraries. 
> 
> I've no solution to this.  But this would explain random failures that
> occur on high traffic sites. 
> 
> Oh and the comment I just stuffed into mod_include... that explains core
> dumps on sites using mod_include extensively.  It's bogus to set a
> hard_timeout and then subject essentially arbitrary code to it --
> hard_timeout should be used locally in places where all the code is
> controlled.

And here is a patch which works around the problem by doing a fairly
strict ordering on the reads/writes of the parent and children.  Don't
balk at the apparent spinning loop.  This race condition is so rare that I
can't even figure out how to make a reliable test case to even test if my
workaround works.  The spinning loop only "spins" if somehow the machine
is so locked up that one child doesn't get a time slice in
SCOREBOARD_MAINTENANCE_INTERVAL (which is 1s by default). 

And as I babbled above, even with this fix the code is still broken in
general.  It is not valid to longjmp() out of third party libraries, and
it's not valid to take an EINTR in third party libraries.  So we're
screwed either way. 

I don't particularly care if the patch goes in or not.  I suppose it could
use a null select() instead of sched_yield(), it'd be more portable. 

The bug in mod_include still exists.  I'm about to open a PR on it. 

Dean

Index: include/conf.h
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/include/conf.h,v
retrieving revision 1.201
diff -u -r1.201 conf.h
--- conf.h	1998/04/15 17:09:27	1.201
+++ conf.h	1998/04/18 07:17:08
@@ -378,6 +378,9 @@
 /* flock is faster ... but hasn't been tested on 1.x systems */
 #define USE_FLOCK_SERIALIZED_ACCEPT
 
+/* Some broken versions of sched.h use the wrong _ macro.  Workaround. */
+#define _P __P
+
 #else
 #define USE_FCNTL_SERIALIZED_ACCEPT
 #endif
Index: include/httpd.h
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/include/httpd.h,v
retrieving revision 1.205
diff -u -r1.205 httpd.h
--- httpd.h	1998/04/16 00:19:39	1.205
+++ httpd.h	1998/04/18 07:17:11
@@ -965,8 +965,10 @@
 /* The optimized timeout code only works if we're not MULTITHREAD and we're
  * also not using a scoreboard file
  */
-#if !defined (MULTITHREAD) && \
-    (defined (USE_MMAP_SCOREBOARD) || defined (USE_SHMGET_SCOREBOARD))
+#if !defined (MULTITHREAD) \
+    && (defined (USE_MMAP_SCOREBOARD) || defined (USE_SHMGET_SCOREBOARD)) \
+    && defined (_POSIX_PRIORITY_SCHEDULING)
+#include <sched.h>
 #define OPTIMIZE_TIMEOUTS
 #endif
 
Index: include/scoreboard.h
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/include/scoreboard.h,v
retrieving revision 1.39
diff -u -r1.39 scoreboard.h
--- scoreboard.h	1998/04/11 12:00:23	1.39
+++ scoreboard.h	1998/04/18 07:17:12
@@ -134,6 +134,10 @@
 typedef struct {
     int exit_generation;	/* Set by the main process if a graceful
 				   restart is required */
+#ifdef OPTIMIZE_TIMEOUTS
+    time_t maintenance_time;	/* used to interlock between parent 
+                                   and ap_kill_timeout() */
+#endif
 } global_score;
 
 /* stuff which the parent generally writes and the children rarely read */
Index: main/http_main.c
===================================================================
RCS file: /export/home/cvs/apache-1.3/src/main/http_main.c,v
retrieving revision 1.324
diff -u -r1.324 http_main.c
--- http_main.c	1998/04/11 12:00:29	1.324
+++ http_main.c	1998/04/18 07:17:22
@@ -981,9 +981,33 @@
 	/* Just note the timeout in our scoreboard, no need to call the system.
 	 * We also note that the virtual time has gone forward.
 	 */
-	old = ap_scoreboard_image->servers[my_child_num].timeout_len;
-	ap_scoreboard_image->servers[my_child_num].timeout_len = x;
-	++ap_scoreboard_image->servers[my_child_num].cur_vtime;
+	short_score *ss;
+
+	ss = &ap_scoreboard_image->servers[my_child_num];
+	++ss->cur_vtime;
+	old = ss->timeout_len;
+	ss->timeout_len = x;
+	if (old && !x) {
+	    /* We have to be clever to avoid a race condition here between the
+	     * child and parent.  The parent could be in the middle of
+	     * perform_idle_server_maintenance(), in which case it could be
+	     * ready to send a SIGALRM because our timer wore out.  So what
+	     * we'll do here is redo the calculation the parent is doing,
+	     * and wait for it to signal us or skip us before we continue.
+	     *
+	     * The ordering that various fields are updated ensure that
+	     * the calculation below will use exactly the same values as
+	     * the parent is using.
+	     */
+	    time_t maint = ap_scoreboard_image->global.maintenance_time;
+	    if (maint
+		&& ap_scoreboard_image->parent[my_child_num].last_rtime
+		    + old < maint) {
+		do {
+		    sched_yield();
+		} while (ap_scoreboard_image->global.maintenance_time);
+	    }
+	}
     }
 #endif
 #endif
@@ -3610,6 +3634,9 @@
     int free_slots[MAX_SPAWN_RATE];
     int last_non_dead;
     int total_non_dead;
+#ifdef OPTIMIZE_TIMEOUTS
+    unsigned timeout_len;
+#endif
 
     /* initialize the free_list */
     free_length = 0;
@@ -3620,6 +3647,9 @@
     total_non_dead = 0;
 
     ap_sync_scoreboard_image();
+#ifdef OPTIMIZE_TIMEOUTS
+    ap_scoreboard_image->global.maintenance_time = now;
+#endif
     for (i = 0; i < ap_daemons_limit; ++i) {
 	int status;
 
@@ -3655,7 +3685,10 @@
 	    ++total_non_dead;
 	    last_non_dead = i;
 #ifdef OPTIMIZE_TIMEOUTS
-	    if (ss->timeout_len) {
+	    /* see comments in ap_set_callback_and_alarm() for more information
+	     * on why things are ordered in the way they are
+	     */
+	    if ((timeout_len = ss->timeout_len)) {
 		/* if it's a live server, with a live timeout then
 		 * start checking its timeout */
 		parent_score *ps = &ap_scoreboard_image->parent[i];
@@ -3665,7 +3698,7 @@
 		    ps->last_rtime = now;
 		    ps->last_vtime = ss->cur_vtime;
 		}
-		else if (ps->last_rtime + ss->timeout_len < now) {
+		else if (ps->last_rtime + timeout_len < now) {
 		    /* no progress, and the timeout length has been exceeded */
 		    ss->timeout_len = 0;
 		    kill(ps->pid, SIGALRM);
@@ -3674,6 +3707,9 @@
 #endif
 	}
     }
+#ifdef OPTIMIZE_TIMEOUTS
+    ap_scoreboard_image->global.maintenance_time = 0;
+#endif
     max_daemons_limit = last_non_dead + 1;
     if (idle_count > ap_daemons_max_free) {
 	/* kill off one child... we use SIGUSR1 because that'll cause it to