You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Virag Kothari (JIRA)" <ji...@apache.org> on 2012/11/29 23:18:59 UTC

[jira] [Updated] (OOZIE-1065) bundle status does not transit after rerun

     [ https://issues.apache.org/jira/browse/OOZIE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Virag Kothari updated OOZIE-1065:
---------------------------------

    Attachment: OOZIE-1065.patch

When rerun cmd is issued on killed coordinator, the bundle action pending flag is not reset. Hence the state transition of bundle is not happening. The patch calls the the parent of the killed coordinator to reset its pending flag.

Patch for review at
https://reviews.apache.org/r/8282/
                
> bundle status does not transit after rerun
> ------------------------------------------
>
>                 Key: OOZIE-1065
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1065
>             Project: Oozie
>          Issue Type: Bug
>          Components: bundle
>    Affects Versions: 3.3.0
>            Reporter: michelle chiang
>            Priority: Minor
>         Attachments: OOZIE-1065.patch
>
>
> 2 similar cases.
> 1. submit a bundle, with 3 coord jobs. kill coord-job-1.
>    coord-job-1 becomes KILLED with both actions KILLED.
>    the other 2 coord jobs finished SUCCEEDED. and bundle job is DONEWITHERROR.
>    rerun bundle job, -coordinator=coord-job-1. as soon as the rerun command is issued, bundle job status is RUNNINGWITHERROR.
>    because coord-job-1 is in KILLED, it cannot be rerun.
>    but bundle job stays in RUNNINGWITHERROR when all 3 coord jobs in terminal states (KILLED, SUCCEEDED, SUCCEEDED).
>    kill the bundle job. then bundle transit to KILLED for a second, then back to RUNNINGWITHERROR.
> 2. submit a bundle, with 3 coord jobs. kill coord-job-1.
>    coord-job-1 becomes DONEWITHERROR with 1 action SUCCEEDED, and 1 action KILLED.
>    the other 2 coord jobs finished SUCCEEDED. and bundle job is DONEWITHERROR.
>    rerun bundle job, -coordinator=coord-job-1. as soon as the rerun command is issued, bundle job status is RUNNINGWITHERROR.
>    coord-job-1 is in RUNNING after rerun.
>    but bundle job stays in RUNNINGWITHERROR, and does not transit to RUNNING, when 1 coord job RUNNING and other 2 coord job SUCCEEDED.
>    and bundle job stays in RUNNINGWITHERROR when all 3 coord jobs in terminal states (DONEWITHERROR, SUCCEEDED, SUCCEEDED).
>    kill the bundle job. then bundle transit to KILLED for a second, then back to RUNNINGWITHERROR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira