You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Cliff Woolley <jw...@virginia.edu> on 2003/07/21 04:26:27 UTC

suexec+CGI = zombies in 1.3.28

For those not watching the bugzilla notices, I direct your attention to
bug 21737:

http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21737

Apparently there has been a regression in 1.3.28 from 1.3.27 whereby
CGI scripts are getting left around as zombies when suexec is in use,
apparently because of a change in src/main/alloc.c that altered the
behavior when sending SIGTERM to a child process.  With suexec, the
SIGTERM at line 2862 will fail not because the subprocess is dead already
but because the httpd uid has no permission to term the cgi process, which
is running as some other user.

A suggested patch posted by one of the users is below.  It doesn't look
portable to me, but the idea seems sound.

Thoughts?
Cliff



--- apache_1.3.28/src/main/alloc.c.orig	Sun Jul 20 14:30:30 2003
+++ apache_1.3.28/src/main/alloc.c	Sun Jul 20 14:33:50 2003
@@ -2860,7 +2860,14 @@
 	    || (p->kill_how == kill_only_once)) {
 	    /* Subprocess may be dead already.  Only need the timeout if not. */
 	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
-                p->kill_how = kill_never;
+		/* If the kill failed, find out why.  If the process does
+		   not exist then we do not need to kill it.  */
+		if (errno == ESRCH) {
+	                p->kill_how = kill_never;
+		}
+		else {
+			need_timeout = 1;
+		}
             }
             else {
 		need_timeout = 1;


Re: suexec+CGI = zombies in 1.3.28

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sun, 20 Jul 2003, Cliff Woolley wrote:

> For those not watching the bugzilla notices, I direct your attention to
> bug 21737:

Oops, Joshua already posted about this.  Sorry about the dupe.  :)

Re: suexec+CGI = zombies in 1.3.28

Posted by Jeff Trawick <tr...@attglobal.net>.
Ralf S. Engelschall wrote:
> Index: alloc.c
> ===================================================================
> RCS file: /e/apache/cvs/apache-1.3/src/main/alloc.c,v
> retrieving revision 1.145
> diff -u -d -r1.145 alloc.c
> --- alloc.c	20 Jun 2003 15:05:40 -0000	1.145
> +++ alloc.c	29 Jul 2003 19:07:46 -0000
> @@ -2859,12 +2859,8 @@
>  	if ((p->kill_how == kill_after_timeout)
>  	    || (p->kill_how == kill_only_once)) {
>  	    /* Subprocess may be dead already.  Only need the timeout if not. */
> -	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
> -                p->kill_how = kill_never;
> -            }
> -            else {
> -		need_timeout = 1;
> -            }
> +	    ap_os_kill(p->pid, SIGTERM);
> +	    need_timeout = 1;

The drawback of this is that often* when suexec is not in use we end up 
burning extra syscalls when it is not necessary.

*"often" is speculation and not something I've tested

If it were instead:

             if (ap_os_kill(p->pid, SIGTERM) == -1 &&
                 errno == ESRCH) {
                 p->kill_how = skip_timeout; /* some new value */
             }
             else {
                 need_timeout = 1;
             }

we'd still do the waitpid() at the end but we wouldn't insert a needless 
timeout loop when we know this child is a goner.


Re: suexec+CGI = zombies in 1.3.28

Posted by Cliff Woolley <jw...@virginia.edu>.
On Sat, 2 Aug 2003, Bill Stoddard wrote:

> >>-	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
> >>-                p->kill_how = kill_never;
> >>-            }
> >>-            else {
> >>-		need_timeout = 1;
> >>-            }
> >>+	    ap_os_kill(p->pid, SIGTERM);
> >>+	    need_timeout = 1;
>
> So you sucessfully kill the process, then you set need_timeout. You swap
> out and another process is started (by an httpd process) on the system
> with the same pid. Your swaped back in, detect the process (thinking it
> is the old process still hanging around) and kill it after a timeout.
> Is this possible or not?

Seems like the set of circumstances that would have to occur is fairly
unlikely but possible.  Though afaict those same circumstances would have
have been possible and would have had the same result even without this
patch.  No?

--Cliff

Re: suexec+CGI = zombies in 1.3.28

Posted by Bill Stoddard <bi...@wstoddard.com>.
Cliff Woolley wrote:
> On Tue, 29 Jul 2003, Ralf S. Engelschall wrote:
> 
> 
>>Index: alloc.c
>>===================================================================
>>RCS file: /e/apache/cvs/apache-1.3/src/main/alloc.c,v
>>retrieving revision 1.145
>>diff -u -d -r1.145 alloc.c
>>--- alloc.c	20 Jun 2003 15:05:40 -0000	1.145
>>+++ alloc.c	29 Jul 2003 19:07:46 -0000
>>@@ -2859,12 +2859,8 @@
>> 	if ((p->kill_how == kill_after_timeout)
>> 	    || (p->kill_how == kill_only_once)) {
>> 	    /* Subprocess may be dead already.  Only need the timeout if not. */
>>-	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
>>-                p->kill_how = kill_never;
>>-            }
>>-            else {
>>-		need_timeout = 1;
>>-            }
>>+	    ap_os_kill(p->pid, SIGTERM);
>>+	    need_timeout = 1;
>> 	}
>> 	else if (p->kill_how == kill_always) {
>> 	    kill(p->pid, SIGKILL);
> 
> 
> +1, looks good.
> 
> --Cliff
> 

So you sucessfully kill the process, then you set need_timeout. You swap 
out and another process is started (by an httpd process) on the system 
with the same pid. Your swaped back in, detect the process (thinking it 
is the old process still hanging around) and kill it after a timeout. 
Is this possible or not?

Bill


Re: suexec+CGI = zombies in 1.3.28

Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 29 Jul 2003, Ralf S. Engelschall wrote:

> Index: alloc.c
> ===================================================================
> RCS file: /e/apache/cvs/apache-1.3/src/main/alloc.c,v
> retrieving revision 1.145
> diff -u -d -r1.145 alloc.c
> --- alloc.c	20 Jun 2003 15:05:40 -0000	1.145
> +++ alloc.c	29 Jul 2003 19:07:46 -0000
> @@ -2859,12 +2859,8 @@
>  	if ((p->kill_how == kill_after_timeout)
>  	    || (p->kill_how == kill_only_once)) {
>  	    /* Subprocess may be dead already.  Only need the timeout if not. */
> -	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
> -                p->kill_how = kill_never;
> -            }
> -            else {
> -		need_timeout = 1;
> -            }
> +	    ap_os_kill(p->pid, SIGTERM);
> +	    need_timeout = 1;
>  	}
>  	else if (p->kill_how == kill_always) {
>  	    kill(p->pid, SIGKILL);

+1, looks good.

--Cliff

Re: suexec+CGI = zombies in 1.3.28

Posted by "Ralf S. Engelschall" <rs...@engelschall.com>.
On Sun, Jul 20, 2003, Cliff Woolley wrote:

> For those not watching the bugzilla notices, I direct your attention to
> bug 21737:
>
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21737
>
> Apparently there has been a regression in 1.3.28 from 1.3.27 whereby
> CGI scripts are getting left around as zombies when suexec is in use,
> apparently because of a change in src/main/alloc.c that altered the
> behavior when sending SIGTERM to a child process.  With suexec, the
> SIGTERM at line 2862 will fail not because the subprocess is dead already
> but because the httpd uid has no permission to term the cgi process, which
> is running as some other user.
>
> A suggested patch posted by one of the users is below.  It doesn't look
> portable to me, but the idea seems sound.
>
> Thoughts?
> Cliff
>
>
> --- apache_1.3.28/src/main/alloc.c.orig	Sun Jul 20 14:30:30 2003
> +++ apache_1.3.28/src/main/alloc.c	Sun Jul 20 14:33:50 2003
> @@ -2860,7 +2860,14 @@
>  	    || (p->kill_how == kill_only_once)) {
>  	    /* Subprocess may be dead already.  Only need the timeout if not. */
>  	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
> -                p->kill_how = kill_never;
> +		/* If the kill failed, find out why.  If the process does
> +		   not exist then we do not need to kill it.  */
> +		if (errno == ESRCH) {
> +	                p->kill_how = kill_never;
> +		}
> +		else {
> +			need_timeout = 1;
> +		}
>              }
>              else {
>  		need_timeout = 1;

Because we're currently releasing OpenPKG 1.3 with Apache 1.3.28
included, we had to investigate today on this issue and finally think
although the above patch works and could be comitted this way, more
correct and straight-foreward would be the simple:

Index: alloc.c
===================================================================
RCS file: /e/apache/cvs/apache-1.3/src/main/alloc.c,v
retrieving revision 1.145
diff -u -d -r1.145 alloc.c
--- alloc.c	20 Jun 2003 15:05:40 -0000	1.145
+++ alloc.c	29 Jul 2003 19:07:46 -0000
@@ -2859,12 +2859,8 @@
 	if ((p->kill_how == kill_after_timeout)
 	    || (p->kill_how == kill_only_once)) {
 	    /* Subprocess may be dead already.  Only need the timeout if not. */
-	    if (ap_os_kill(p->pid, SIGTERM) == -1) {
-                p->kill_how = kill_never;
-            }
-            else {
-		need_timeout = 1;
-            }
+	    ap_os_kill(p->pid, SIGTERM);
+	    need_timeout = 1;
 	}
 	else if (p->kill_how == kill_always) {
 	    kill(p->pid, SIGKILL);

That is, we don't have to check for the return value of ap_os_kill()
and especially not check for ESRCH, because we _HAVE_ to waitpid() for
it anyway (because it's our child and it either is already terminated
and is waiting as a zombie for our waitpid() or it is still running).

Under Unix it cannot be that a (non-detached in the sense of BSD's
daemon(3)) child of a process just does no longer exists as long as
the parent still exists and as long as the parent still has not done
waitpid() for the child. So ESRCH cannot happen in our situation and the
patch we currently use is fully sufficient. Both are at least portable
enough for Unix, of course...
                                       Ralf S. Engelschall
                                       rse@engelschall.com
                                       www.engelschall.com


Re: suexec+CGI = zombies in 1.3.28

Posted by Cliff Woolley <jw...@virginia.edu>.
On Tue, 22 Jul 2003, Adam Warner wrote:

> Just a long shot but is the regression in 2.0.47 as well? I've been
> getting defunct processes with nothing being served and the clincher is
> that all my HTML pages are served via CGI/suexec.

While it's certainly possible it's broken in 2.0.47, it's unlikely to be
exactly the same thing, since the related code is completely different in
2.0.  I perused the changes between 2.0.46 and 2.0.47 just now and didn't
see too many things that were overly suspicious.

 - server/core.c r1.225.2.7 (not likely)
 - modules/generators/mod_cgid.c r1.145.2.6 (possible but unlikely)
 - srclib/apr/file_io/unix/readwrite.c r1.86 (possible)

There are obviously a bunch of other things that changed, but none of them
really jump off the diff at me.  You might try reverting the above changes
one at a time and see if that helps.

Sorry this is so vague... I just don't have a good way to reproduce the
problem so all I can do is guess.

--Cliff

Re: suexec+CGI = zombies in 1.3.28

Posted by Adam Warner <li...@consulting.net.nz>.
Hi Cliff Woolley,

> For those not watching the bugzilla notices, I direct your attention to
> bug 21737:
> 
> http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21737
> 
> Apparently there has been a regression in 1.3.28 from 1.3.27 whereby CGI
> scripts are getting left around as zombies when suexec is in use,
> apparently because of a change in src/main/alloc.c that altered the
> behavior when sending SIGTERM to a child process.  With suexec, the
> SIGTERM at line 2862 will fail not because the subprocess is dead
> already but because the httpd uid has no permission to term the cgi
> process, which is running as some other user.

Just a long shot but is the regression in 2.0.47 as well? I've been
getting defunct processes with nothing being served and the clincher is
that all my HTML pages are served via CGI/suexec. I wrote this to Apache
User a couple of days ago:
http://lists.debian.org/debian-apache/2003/debian-apache-200307/msg00065.html
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=202124

Thom noted that it may be related to this bug report:
http://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no&bug=202094

Regards,
Adam