You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Julia Kinga Marton (JIRA)" <ji...@apache.org> on 2019/04/10 07:28:00 UTC

[jira] [Comment Edited] (OOZIE-2882) Rerun workflow fails Error: E0404

    [ https://issues.apache.org/jira/browse/OOZIE-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16803884#comment-16803884 ] 

Julia Kinga Marton edited comment on OOZIE-2882 at 4/10/19 7:27 AM:
--------------------------------------------------------------------

I was able to repduce the issue as well with the following steps:
 * create a workflow with a fork node and 3 shell actions. One of them was expected to fail every time.
 * submit the wf and wait to fail
 * rerun the wf with oozie.wf.rerun.failnodes set to true -> it will faile as expected 
 * rerun the wf with oozie.wf.rerun.skip.nodes set to true -> the error from the description is thrown

The problem is that this values are set as properties of the wf. In the code we are comparing this values to [*null* instead of checking its values|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/DagEngine.java#L300-L312]

The suggestion from the description to reset this values once the wf is finished is one approach. 

There is another problem: if we rerun the wf from command line and set oozie.wf.rerun.failnodes=false and oozie.wf.rerun.skip.nodes=true, the same error will be thrown. 

Instead of resetting the values, I would fix the code to check if both values are true instead of checking if they have a value or not. This way, it will be possible to overwrite the value of this properties.


was (Author: kmarton):
I was able to repduce the issue as well with the following steps:
 * create a workflow with a fork node and 3 shell actions. One of them was expected to fail every time.
 * submit the wf and wait to fail
 * rerun the wf with oozie.wf.rerun.failnodes set to true -> it will failed as expected 
 * rerun the wf with oozie.wf.rerun.skip.nodes set to true -> the error from the description is thrown

The problem is that this values are set as properties of the wf. In the code we are comparing this values to [*null* instead of checking its values|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/DagEngine.java#L300-L312]

The suggestion from the description to reset this values once the wf is finished is one approach. 

There is another problem: if we rerun the wf from command line and set oozie.wf.rerun.failnodes=false and oozie.wf.rerun.skip.nodes=true, the same error will be thrown. 

Instead of resetting the values, I would fix the code to check if both values are true instead of checking if they have a value or not. This way, it will be possible to overwrite the value of this properties.

> Rerun workflow fails Error: E0404
> ---------------------------------
>
>                 Key: OOZIE-2882
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2882
>             Project: Oozie
>          Issue Type: Improvement
>            Reporter: Attila Sasvari
>            Assignee: Julia Kinga Marton
>            Priority: Major
>
> Only one of the properties are allowed [oozie.wf.rerun.skip.nodes OR oozie.wf.rerun.failnodes]
> Reproduction:
> 1. Create a workflow with more than 1 node. Eg: Fork - with three parallel shell actions. Make sure one of them fails
> 2. Rerun with 'oozie.wf.rerun.failnodes' set.
> 3. Rerun again with 'oozie.wf.rerun.skip.nodes' and check 'Skip all successful nodes'.
> You will get the following error.
> Error: E0404 : E0404: Only one of the properties are allowed [oozie.wf.rerun.skip.nodes OR oozie.wf.rerun.failnodes]
> When a user reruns a workflow job with oozie.wf.rerun.failnode=true and if the job fails in subsequent steps, we do not have an option to resubmit the workflow using oozie.wf.rerun.skip.node=action1,action2 to allow submission from predecessor steps.
> Currently, once the workflow fails and one of the rerun options is used for job rerun it gets merged and there is no way to override like regular oozie configurations or variables.
> We have a few options:
> 1. If fail.nodes and skip.nodes are specified at the same time (or one of them was carried over from a previous wf run), we can add {generate skip.nodes by discovering nodes that did not fail} union {skip.nodes}
> 2. Add a way to remove properties (this is also is potentially helpful for other use cases)
> 3. The "newest" property (oozie.wf.rerun.skip.nodes or oozie.wf.rerun.failnodes) takes priority and the previous is ignored
> 4. Make oozie.wf.rerun.skip.nodes or oozie.wf.rerun.failnodes somehow not persist in the DB
> Part of this JIRA would be to figure out which is the best option.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)