You are viewing a plain text version of this content. The canonical link for it is here.
Posted to modperl@perl.apache.org by Bill Moseley <mo...@hank.org> on 2000/04/12 09:18:35 UTC

[OT] Killing off children

Hello,

I noticed on the Apache server-status report a child that is stuck in "G"
(Gracefully finishing) after a SIGUSR1 today.  Twelve hours ago.

I don't have root access on this machine, but I thought I'd run a CGI
script to run as "nobody" and kill off the child.  (I've done this once
before when working late from home and sending a mod_perl script into a
memory hungry loop.)

But in this case the child sitting there in "G" just won't die.  I've tried
sending as many signals as I can imagine HUP INT QUIT ABRT KILL TERM USR1
STOP CONT and THAW.  I did note that the process was serving it's very
first request when SIGURS1 was sent.  And I just sent a SIGHUP to another
Apache child from a CGI script and the child died.

We had a similar situation a month or so back where an Apache restart left
a bunch of children behind (and holding onto port 80), and thus the server
wouldn't restart.  This time running as root and still couldn't kill the
processes.  The machine was finally rebooted.

Anyone know how to kill off these children (without having to reboot)?

% uname -a
SunOS 5.6 Generic_105181-17 sun4u sparc SUNW,Ultra-Enterprise

Apache/1.3.9 (Unix) mod_perl/1.21

Thanks,

Bill Moseley
mailto:moseley@hank.org

Re: [OT] Killing off children

Posted by Doug MacEachern <do...@covalent.net>.
> Two followup questions:
> 
> 1) Would truss show this -- or is there a way to test this?
> 
> 2) Would you expect the child to survive a kill -9 while hanging?

if the process is in a state where you can't kill -9 or attach with strace
or gdb, etc., i have no idea exactly what triggered that.  all i'm saying
is, it's possible that something during Perl cleanup did.  then again, the
process may have already been in that state before you ran kill -HUP,
which would make my suggestion worthless.   next time, before you kill
-HUP, check the process states with ps or top or something.  if a process
is already in this hosed state, maybe you can dig something useful out of
/proc.  i don't know of any tools to debug a process you can't kill -9 or
attach to.


Re: [OT] Killing off children

Posted by Bill Moseley <mo...@hank.org>.
At 10:49 PM 04/12/00 -0700, Doug MacEachern wrote:
>On Wed, 12 Apr 2000, Bill Moseley wrote:
>
>> Hello,
>> 
>> I noticed on the Apache server-status report a child that is stuck in "G"
>> (Gracefully finishing) after a SIGUSR1 today.  Twelve hours ago.
>
>this could be perl_destruct() hanging while trying to cleanup.  this
>normally isn't a requirement, you can disable by setting the
>PERL_DESTRUCT_LEVEL environment variable to -1

Two followup questions:

1) Would truss show this -- or is there a way to test this?

2) Would you expect the child to survive a kill -9 while hanging?

Thanks,


Bill Moseley
mailto:moseley@hank.org

Re: Killing off children

Posted by Bill Moseley <mo...@hank.org>.
At 10:00 PM 04/12/00 -0400, Greg Stark wrote:
>
>What state was the process in?

'S' sleeping, waiting for some event.

% ps -p3732
S     USER   PID  PPID %CPU %MEM  VSZ  RSS    STIME        TIME COMMAND
S  nobody4  3732  7815  0.0  0.2 7232 3656 16:06:59        0:00 httpd.mod

>Disk wait (D in ps), which is actually any uninterruptible sleep, usually
disk
>i/o, indicates either something is broken in your kernel or that you're using
>NFS. 

Nope, not running over NFS, for logs or anything else.



Bill Moseley
mailto:moseley@hank.org

Re: Killing off children

Posted by Greg Stark <gs...@mit.edu>.
What state was the process in?

There are only two states that a process can be in that won't respond to -9:

Zombie (Z in ps), in which case the process is already dead and Apache didn't wait on it
properly. This isn't a problem just ignore it unless you can reproduce it in
which case report it as an Apache bug.

Disk wait (D in ps), which is actually any uninterruptible sleep, usually disk
i/o, indicates either something is broken in your kernel or that you're using
NFS. 

In the case of disk wait you can actually get the wait channel from ps and
look it up in your kernel symbol table to find out what resource it was
waiting on. What you would do with that information isn't clear though. I
suppose it might point the way to what component of the system was misbehaving
if the problem occurred frequently.

Bill Moseley <mo...@hank.org> writes:

> Yes, a system problem, I guess, if kill -9 doesn't work I'm not sure what
> will work.  I tried a truss, but it reported "truss: cannot control process
> 3732" which I assume is because it's parent is root, and my cgi was running
> as nobody.

root should be able to truss any process other than init. I suspect this
message meant the process was actually a zombie and not really alive at all.

-- 
greg


Re: [OT] Killing off children

Posted by Bill Moseley <mo...@hank.org>.
At 09:25 AM 04/12/00 +0200, Stas Bekman wrote:
>On Wed, 12 Apr 2000, Bill Moseley wrote:
>> I noticed on the Apache server-status report a child that is stuck in "G"
>> (Gracefully finishing) after a SIGUSR1 today.  Twelve hours ago.
>> 

>Looks like a system problem, what do you see when you attach to a process
>with 'strace -p PID'? For example it might enter some uninterruptable
>sleep waiting for some event to happen, but this will never do. Generally
>using strace (or truss) reveals some info.

Yes, a system problem, I guess, if kill -9 doesn't work I'm not sure what
will work.  I tried a truss, but it reported "truss: cannot control process
3732" which I assume is because it's parent is root, and my cgi was running
as nobody.

I'll try it again when root wakes up and comes into work....

Thanks,




Bill Moseley
mailto:moseley@hank.org

Re: [OT] Killing off children

Posted by Doug MacEachern <do...@covalent.net>.
On Wed, 12 Apr 2000, Bill Moseley wrote:

> Hello,
> 
> I noticed on the Apache server-status report a child that is stuck in "G"
> (Gracefully finishing) after a SIGUSR1 today.  Twelve hours ago.

this could be perl_destruct() hanging while trying to cleanup.  this
normally isn't a requirement, you can disable by setting the
PERL_DESTRUCT_LEVEL environment variable to -1



Re: [OT] Killing off children

Posted by Stas Bekman <sb...@stason.org>.
On Wed, 12 Apr 2000, Bill Moseley wrote:

> Hello,
> 
> I noticed on the Apache server-status report a child that is stuck in "G"
> (Gracefully finishing) after a SIGUSR1 today.  Twelve hours ago.
> 
> I don't have root access on this machine, but I thought I'd run a CGI
> script to run as "nobody" and kill off the child.  (I've done this once
> before when working late from home and sending a mod_perl script into a
> memory hungry loop.)
> 
> But in this case the child sitting there in "G" just won't die.  I've tried
> sending as many signals as I can imagine HUP INT QUIT ABRT KILL TERM USR1
> STOP CONT and THAW.  I did note that the process was serving it's very
> first request when SIGURS1 was sent.  And I just sent a SIGHUP to another
> Apache child from a CGI script and the child died.
> 
> We had a similar situation a month or so back where an Apache restart left
> a bunch of children behind (and holding onto port 80), and thus the server
> wouldn't restart.  This time running as root and still couldn't kill the
> processes.  The machine was finally rebooted.
> 
> Anyone know how to kill off these children (without having to reboot)?

Looks like a system problem, what do you see when you attach to a process
with 'strace -p PID'? For example it might enter some uninterruptable
sleep waiting for some event to happen, but this will never do. Generally
using strace (or truss) reveals some info.

> % uname -a
> SunOS 5.6 Generic_105181-17 sun4u sparc SUNW,Ultra-Enterprise
> 
> Apache/1.3.9 (Unix) mod_perl/1.21
> 
> Thanks,
> 
> Bill Moseley
> mailto:moseley@hank.org
> 



______________________________________________________________________
Stas Bekman             | JAm_pH    --    Just Another mod_perl Hacker
http://stason.org/      | mod_perl Guide http://perl.apache.org/guide/ 
mailto:stas@stason.org  | http://perl.org    http://stason.org/TULARC/
http://singlesheaven.com| http://perlmonth.com http://sourcegarden.org
----------------------------------------------------------------------