You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by "Paul J. Reder" <re...@raleigh.ibm.com> on 2000/06/05 21:51:02 UTC

Possible zombie processes from running cgis.

I believe I may have an answer for why mod_cgid seems to do so much
better than mod_cgi. When running the tests repeatedly I noticed that
processes seemed to be getting left around. A simple test can reproduce
it for me. I built a default Apache (buildconf, configure, make) then
started an Apache and ran the following commands one after another with
the reported results.

[rederpj@saturn support]$ ab -n 1000 -c 10 -w http://saturn.Reders:8080/cgi-bin/printenv > junk.html
[rederpj@saturn support]$ ab -n 1000 -c 10 -w http://saturn.Reders:8080/cgi-bin/printenv > junk.html
[rederpj@saturn support]$ ab -n 1000 -c 10 -w http://saturn.Reders:8080/cgi-bin/printenv > junk.html
bash: fork: Resource temporarily unavailable
     or sometimes:
Socket:: Too many open files


Checking ps -ef showed hundreds of what appear to be zombied processes
of the following form...

rederpj  14127 12150  0 14:01 ?        00:00:00 [printenv <defunct>]
rederpj  14128 12184  0 14:01 ?        00:00:00 [printenv <defunct>]
rederpj  14129 12174  0 14:01 ?        00:00:00 [printenv <defunct>]

I even waited an hour to see if they were waiting to timeout and cleanup
but saw no change.

Netstat -a also showed that there were hundreds of sockets in TIME_WAIT
status, but I don't think that is a problem. I believe Apache is configured
to reuse sockets, so these should be ok.

Mod_cgid does not exhibit the same behavior. I can run the tests over and over
without any zombied processes showing up.

Ryan, do you know of any case where cgi processes aren't being cleaned up
properly? Is anyone else noticing this? Am I doing something stupid, or
looking at this wrong?

My code base is about a week old. It is built and running on Linux.

-- 

Paul J. Reder

---------------------------------------------------------------------
Noise proves nothing.  Often a hen who has merely laid an egg cackles
as if she laid an asteroid.
		-- Mark Twain
(An apt quote in this political season.)

Re: Possible zombie processes from running cgis.

Posted by Jeff Trawick <tr...@bellsouth.net>.
mod_cgi was broken during the last alpha cycle.  See my comments with
the commit of mod_cgi.c a minute or so ago.

A system trace of 1.3 vs. 2.0 was very instructive with this problem.
It was easy to note that with 2.0, garbage was passed to waitpid().
It didn't take too long to figure out what was going on once I thought
to look at the system trace.

strace -p pid:

...
wait4(-1073759584, NULL, WNOHANG|WUNTRACED, NULL) = -1 ECHILD (No child processes)
kill(-1073759584, SIGTERM)              = -1 ESRCH (No such process)
kill(-1073759584, SIGKILL)              = -1 ESRCH (No such process)
...

-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...

Re: Possible zombie processes from running cgis.

Posted by rb...@covalent.net.
> > rederpj  14127 12150  0 14:01 ?        00:00:00 [printenv <defunct>]
> > rederpj  14128 12184  0 14:01 ?        00:00:00 [printenv <defunct>]
> > rederpj  14129 12174  0 14:01 ?        00:00:00 [printenv <defunct>]

> I see zombies from mod_cgi with more recent code.  SIGCHLD processing
> (or lack thereof) seems to be the same as with 1.3.  I guess the mpms
> are screwing up and not calling wait() or waitpid(-1,,)?
> 
> I would think that the call of ap_wait_all_procs() would take care of
> zombies.  Is something going wrong in that area?

I can think of a couple of things that could cause this.  Let's try a few
tests.

1)  Setup a server with reliable piped logs.  After getting a few zombie
processes, try killing off your logger process.  If it comes back, we are
waiting in ap_wait_all_procs correctly.

2)  Is this with a threaded or non-threaded MPM?  If with a threaded MPM,
try with Prefork and see if we can get zombie's that way.

3)  Let's see if this is even an Apache problem (it probably is, but it
doesn't hurt to check)  Record the pid of every PROCESS that is started
(note process not thread).  Check to make sure the zombie's are actually
Apache processes.  Let me explain my thinking.  If Linux is actually
forking the entire process with all of it's threads, and is then killing
the threads off, then each thread must have a pid (on linux each thread
gets its own pid IIRC), and it is possible that these zombies are threads
that are created when we forked, and aren't actually Apache threads.

After we get those answers, I may have more questions or some reasonable
thoughts on what is happening.

Ryan

_______________________________________________________________________________
Ryan Bloom                        	rbb@apache.org
406 29th St.
San Francisco, CA 94131
-------------------------------------------------------------------------------


Re: Possible zombie processes from running cgis.

Posted by Jeff Trawick <tr...@bellsouth.net>.
> Date: Mon, 05 Jun 2000 15:51:02 -0400
> From: "Paul J. Reder" <re...@raleigh.ibm.com>
> 
> Checking ps -ef showed hundreds of what appear to be zombied processes
> of the following form...
> 
> rederpj  14127 12150  0 14:01 ?        00:00:00 [printenv <defunct>]
> rederpj  14128 12184  0 14:01 ?        00:00:00 [printenv <defunct>]
> rederpj  14129 12174  0 14:01 ?        00:00:00 [printenv <defunct>]
> 
> I even waited an hour to see if they were waiting to timeout and cleanup
> but saw no change.
> 
> Netstat -a also showed that there were hundreds of sockets in TIME_WAIT
> status, but I don't think that is a problem. I believe Apache is configured
> to reuse sockets, so these should be ok.
> 
> Mod_cgid does not exhibit the same behavior. I can run the tests over and over
> without any zombied processes showing up.
> 
> Ryan, do you know of any case where cgi processes aren't being cleaned up
> properly? Is anyone else noticing this? Am I doing something stupid, or
> looking at this wrong?
> 
> My code base is about a week old. It is built and running on Linux.

I see zombies from mod_cgi with more recent code.  SIGCHLD processing
(or lack thereof) seems to be the same as with 1.3.  I guess the mpms
are screwing up and not calling wait() or waitpid(-1,,)?

I would think that the call of ap_wait_all_procs() would take care of
zombies.  Is something going wrong in that area?
-- 
Jeff Trawick | trawick@ibm.net | PGP public key at web site:
     http://www.geocities.com/SiliconValley/Park/9289/
          Born in Roswell... married an alien...