You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@uima.apache.org by "Jerry Cwiklik (JIRA)" <de...@uima.apache.org> on 2014/02/12 18:22:19 UTC
[jira] [Created] (UIMA-3612) DUCC Agent should detect defunct
processes
Jerry Cwiklik created UIMA-3612:
-----------------------------------
Summary: DUCC Agent should detect defunct processes
Key: UIMA-3612
URL: https://issues.apache.org/jira/browse/UIMA-3612
Project: UIMA
Issue Type: Bug
Components: DUCC
Affects Versions: 1.0-Ducc
Reporter: Jerry Cwiklik
Assignee: Jerry Cwiklik
Agent's rogue process detector should change to detect a process that is defunct. Its been observed that a process drops core and remains running as defunct. Since it is up, the agent is happy and keeps reporting the process as Running.
Trying to kill via kill -9 doesnt help. It looks like the defunct process must be be cleaned up by root.
Modify code to change the state of such process from Running to Defunct (?).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
Re: [jira] [Created] (UIMA-3612) DUCC Agent should detect defunct
processes
Posted by Jim Challenger <ch...@gmail.com>.
On this problem - the underlying issue is that the process became a
Zombie and wasn't cleaned up. As best as can be seen the waitpid() call
in Java's Process class never did wake up. This looks like a possible
kernel bug in SLES SP2 as we see bug warnings in the system log that
exactly correlate with the process being zombified.
There's really nothing we can do to clean up the zombie. I killed and
restarted the parent Agent and almost 18 hours later the zombie is still
an undead child of init.
Probably the only thing to do is for Agent to detect the situation and
report the process is gone (it really is).
The Process.waitFor() call is just a java wait() on a monitor (that is
normally notified when the waitpid() system call returns). It should be
possible to interrupt it so the wait thread doesn't leak. Hard to know
what might and might not work because the situation can't, in general,
be replicated in test.
Jim
On 2/12/14 12:22 PM, Jerry Cwiklik (JIRA) wrote:
> Jerry Cwiklik created UIMA-3612:
> -----------------------------------
>
> Summary: DUCC Agent should detect defunct processes
> Key: UIMA-3612
> URL: https://issues.apache.org/jira/browse/UIMA-3612
> Project: UIMA
> Issue Type: Bug
> Components: DUCC
> Affects Versions: 1.0-Ducc
> Reporter: Jerry Cwiklik
> Assignee: Jerry Cwiklik
>
>
> Agent's rogue process detector should change to detect a process that is defunct. Its been observed that a process drops core and remains running as defunct. Since it is up, the agent is happy and keeps reporting the process as Running.
>
> Trying to kill via kill -9 doesnt help. It looks like the defunct process must be be cleaned up by root.
>
> Modify code to change the state of such process from Running to Defunct (?).
>
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.5#6160)