You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Purshotam Shah (JIRA)" <ji...@apache.org> on 2014/05/14 20:53:17 UTC

[jira] [Updated] (OOZIE-1813) Add service to report/kill rogue bundles and coordinator jobs

     [ https://issues.apache.org/jira/browse/OOZIE-1813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Purshotam Shah updated OOZIE-1813:
----------------------------------

    Attachment: OOZIE-1813-V2.patch

> Add service to report/kill rogue bundles and coordinator jobs
> -------------------------------------------------------------
>
>                 Key: OOZIE-1813
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1813
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Purshotam Shah
>            Assignee: Purshotam Shah
>         Attachments: OOZIE-1813-V2.patch
>
>
> People leave their test coordinator and bundle jobs without ever killing them
> and they just eat up resources heavily. We should have a service which periodically check for abandoned coords and report/kill them.
> We can add multiple logic to this like ( number of consecutive failed/timedout action, total number of failed/timedout action). 
> To start with if number of coord action with failed/timedout status > defined value, then coord is considered to be rogue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)