You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Brian Behlendorf <br...@organic.com> on 1995/09/15 06:08:33 UTC

Re: things to look for in runaway server? (fwd)

(Tony, hope you don't mind me forwarding this)

This is a followup to the all-children-were-zombies problem I had this
morning.  I'm positive the scoreboard wasn't getting nuked.  This would seem
to be a fatal error for heavily-used systems.  Thoughts?

	Brian

--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
brian@organic.com  brian@hyperreal.com  http://www.[hyperreal,organic].com/

---------- Forwarded message ----------
Date: Thu, 14 Sep 1995 22:59:37 -0500
From: Tony Sanders <sa...@bsdi.com>
To: Brian Behlendorf <br...@organic.com>
Subject: Re: things to look for in runaway server? 

Brian Behlendorf writes:
> The machine just went south again - this time every process became a 
> zombie.  not sure how sudden it was, since I didn't watch it get to this 
> state, but it was not answering queries even though other services worked 
> fine.  Included here is the output of that script - see anything that 
> could help?  Should I take this to the list?

In this case, it's either an httpd or corruption of the scoreboard
file (or something of that nature).  The parent, 1431 is hung in
a wait system call so it's probably wait'ing on the wrong thing or
something.

httpd should probably be using WNOHANG so that it cannot get stuck.

I'm not really familiar with the code yet but from what I've just
seen, the process handling and scoreboard code looks pretty scary.
The processing handling should probably be abstracted out into an API
and the code moved into it's own file.  This should allow for the
scoreboard to be moved into shared memory (e.g., w/mmap() on BSD/OS)
for performance reasons.

> ------------------ ps axlww
>   UID   PID  PPID CPU PRI NI   VSZ  RSS WCHAN  STAT  TT       TIME COMMAND
>   103   167     1   0   2  0   664  532 select S     ??    2:33.06 /usr/local/sbar/cbd -n
>     0  1431     1   0  10  0   436  216 wait   Is    ??    1:20.85 /usr/local/web/bin/httpd -f /usr/local/web/conf/cyborganic.conf
> 32767 26676  1431   0  28  0     0    0 -      Z     ??    0:00.00 (httpd)
... bunch of these ...
> 32767 27156  1431   3  28  0     0    0 -      Z     ??    0:00.00 (httpd)
> 32767 27166  1431   1   2  0  1024  464 netio  I     ??    0:02.19 /usr/local/web/bin/httpd -f /usr/local/web/conf/cyborganic.conf
> 32767 27168  1431   0  28  0     0    0 -      Z     ??    0:00.00 (httpd)
... bunch more ...