You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2014/06/12 05:45:02 UTC

[jira] [Commented] (OOZIE-1879) Workflow Rerun causes error depending on the order of forked nodes

    [ https://issues.apache.org/jira/browse/OOZIE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028794#comment-14028794 ] 

Hadoop QA commented on OOZIE-1879:
----------------------------------

Testing JIRA OOZIE-1879

Cleaning local git workspace

----------------------------

{color:green}+1 PATCH_APPLIES{color}
{color:green}+1 CLEAN{color}
{color:red}-1 RAW_PATCH_ANALYSIS{color}
.    {color:green}+1{color} the patch does not introduce any @author tags
.    {color:green}+1{color} the patch does not introduce any tabs
.    {color:green}+1{color} the patch does not introduce any trailing spaces
.    {color:red}-1{color} the patch contains 1 line(s) longer than 132 characters
.    {color:green}+1{color} the patch does adds/modifies 2 testcase(s)
{color:green}+1 RAT{color}
.    {color:green}+1{color} the patch does not seem to introduce new RAT warnings
{color:green}+1 JAVADOC{color}
.    {color:green}+1{color} the patch does not seem to introduce new Javadoc warnings
{color:green}+1 COMPILE{color}
.    {color:green}+1{color} HEAD compiles
.    {color:green}+1{color} patch compiles
.    {color:green}+1{color} the patch does not seem to introduce new javac warnings
{color:green}+1 BACKWARDS_COMPATIBILITY{color}
.    {color:green}+1{color} the patch does not change any JPA Entity/Colum/Basic/Lob/Transient annotations
.    {color:green}+1{color} the patch does not modify JPA files
{color:red}-1 TESTS{color} - patch does not compile, cannot run testcases
{color:green}+1 DISTRO{color}
.    {color:green}+1{color} distro tarball builds with the patch 

----------------------------
{color:red}*-1 Overall result, please check the reported -1(s)*{color}


The full output of the test-patch run is available at

.   https://builds.apache.org/job/oozie-trunk-precommit-build/1306/

> Workflow Rerun causes error depending on the order of forked nodes
> ------------------------------------------------------------------
>
>                 Key: OOZIE-1879
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1879
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: trunk
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>            Priority: Blocker
>         Attachments: OOZIE-1879.patch
>
>
> Suppose you have a workflow like this:
> {noformat}
> start --> fork
> fork --> shell1, shell2
> shell1 --> join
> shell2 --> join
> join --> shell3
> shell3 --> end
> {noformat}
> And all but shell3 are successful.  
> Assuming you fix the problem with shell3, if you do a rerun, the following two outcomes can happen:
> # If shell1 finished before shell2, then the rerun succeeds
> # If shell2 finished before shell1, then the rerun fails
> The error in the second outcome is simply this log message:
> {noformat}
> 2014-05-29 17:17:03,735 ERROR org.apache.oozie.workflow.lite.LiteWorkflowInstance: SERVER[cdh5-1.cloudera.local] USER[pdvorak] GROUP[-] TOKEN[] APP[test-rerun-wf] JOB[0000004-140521220856264-oozie-oozi-W] ACTION[0000004-140521220856264-oozie-oozi-W@join] invalid execution path [/shell1/]
> {noformat}
> After a bunch of digging, I discovered that during a rerun with the above workflow or similar workflows, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they are listed in the fork node's XML; however, during the original run, LiteWorkflowInstance#signal gets called for each action in the order that they complete (i.e. endTime).  When these don't match, you get the above error.  The general fix for this is therefore to ensure that during a rerun, LiteWorkflowInstance#signal gets called for each action in the fork node in the order that they originally ran in.  And if you think about it, that is more correct than the current behavior anyway.  



--
This message was sent by Atlassian JIRA
(v6.2#6252)