You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Roy T. Fielding" <fi...@kiwi.ICS.UCI.EDU> on 2000/01/11 04:13:16 UTC

Re: [PATCH] reclaim_child_processes

Could someone commit this please (+1 from me)?  I've been trying to
get my dev system up to speed but am still suffering from having
my mail on one site and machines at another.

....Roy (almost back)

In message <Pi...@taz.hyperreal.org>,
Ed Korthof writes:
>Date: Tue, 4 Jan 2000 13:45:41 -0800 (PST)
>From: Ed Korthof <ed...@apache.org>
>X-Sender: ed@taz.hyperreal.org
>To: new-httpd@apache.org
>Subject: [PATCH] reclaim_child_processes
>Message-ID: <Pi...@taz.hyperreal.org>
>MIME-Version: 1.0
>Content-Type: TEXT/PLAIN; charset=US-ASCII
>Sender: new-httpd-owner@apache.org
>Precedence: bulk
>Reply-To: new-httpd@apache.org
>
>hi --
>
>this patch is intended to avoid a problem which i've witnessed in apache
>installations with certain third party libraries: if there are many
>children, and then take sufficiently long to shut down, then apache's
>reclaim child processes sends the remaining children SIGKILL.  that's
>alright with me -- shutdown shouldn't take so long -- but the current code
>doesn't wait around for them to die; if they haven't all finished
>terminating right away, it sleeps for approximately 16 seconds before
>noticing that they're dead. thus, the total time required to shut down is
>generally a bit more than twenty seconds ... the last 16 seconds of which
>is quite unnecessary.
>
>anyway, what this patch does is change things so that after SIGKILL has
>been sent, the parent checks a couple of times (approx: 16ms, 84ms, 350ms,
>1.4sec) and then decides that the SIGKILL failed.  i could easily adjust
>this so that the total time spent waiting for the SIGKILL to fail is what
>it was -- 16 seconds, give or take -- but in writing this, i figured that
>if they haven't died after 1.4 seconds, then the SIGKILL wasn't
>sufficient; it's not instant, but it should never take that long.
>
>i sent out a couple of patchs about this earlier, but i've laid down the
>crack pipe i was smoking when i sent the first one; and this makes no
>changes to the comments as per Roy's comments.
>
>i realize this doesn't happen except in a fairly pathological situation,
>but we really should change the code in reclaim_child_processes to give
>the children at least a little bit of time to die from the SIGKILL.  in
>the pathological situation in which it occurs, it's quite annoying.
>
>thanks --
>
>ed

*****
Index: http_main.c
===================================================================
RCS file: /home/cvs/apache-1.3/src/main/http_main.c,v
retrieving revision 1.486
diff -b -c -u -r1.486 http_main.c
--- http_main.c	2000/01/01 17:07:34	1.486
+++ http_main.c	2000/01/04 21:34:27
@@ -2378,7 +2378,7 @@
 
     ap_sync_scoreboard_image();
 
-    for (tries = terminate ? 4 : 1; tries <= 9; ++tries) {
+    for (tries = terminate ? 4 : 1; tries <= 12; ++tries) {
 	/* don't want to hold up progress any more than 
 	 * necessary, but we need to allow children a few moments to exit.
 	 * Set delay with an exponential backoff.
@@ -2433,8 +2433,13 @@
 		   "child process %d still did not exit, sending a SIGKILL",
 			    pid);
 		kill(pid, SIGKILL);
+		waittime = 1024 * 16; /* give them some time to die */
 		break;
-	    case 9:     /* 14 sec */
+	    case 9:     /*   6 sec */
+	    case 10:    /* 6.1 sec */
+	    case 11:    /* 6.4 sec */
+		break;
+	    case 12:    /* 7.4 sec */
 		/* gave it our best shot, but alas...  If this really 
 		 * is a child we are trying to kill and it really hasn't
 		 * exited, we will likely fail to bind to the port