You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Virag Kothari (JIRA)" <ji...@apache.org> on 2012/09/25 00:14:08 UTC

[jira] [Comment Edited] (OOZIE-994) ActionCheckXCommand does not handle failures properly

    [ https://issues.apache.org/jira/browse/OOZIE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462188#comment-13462188 ] 

Virag Kothari edited comment on OOZIE-994 at 9/25/12 9:13 AM:
--------------------------------------------------------------

I think its okay if ResumeXCommand doesn't know a way to distinguish between workflow suspended by ActionCheckX or some other reason. If is sees the action status as START_MANUAL, it can queue the ActionStartX. (clean up of actionDir can be done before starting)

If you want all of the checks to be done against the wrapped exceptions first, can we have

{code}
for (){
   if( match (Exception.getcause()){  
      return new AEException("..")   // Return immediately.
    }
    if (match (Exception)){
      Exception e = new AEException ("..") //dont return immediately 
    }
}

if (e!=null){
return e;
}
{code}

 

  
 


                
      was (Author: virag):
    I think its okay if ResumeXCommand doesn't know a way to distinguish between workflow suspended by ActionCheckX or some other reason. If is sees the action status as START_MANUAL, it can queue the ActionStartX. (clean up of actionDir can be done before starting)

If you want all of the checks to be done against the wrapped exceptions first, can we have

{code}
for (){
   if( match (Exception.getcause()){  
      return new AEException("..")   // Return immediately.
    }
    if (match (Exception)){
      Exception e = new AEException ("..") //dont return immediately 
    }
}

if (e!=null){
return e;
}


 

  
 


                  
> ActionCheckXCommand does not handle failures properly
> -----------------------------------------------------
>
>                 Key: OOZIE-994
>                 URL: https://issues.apache.org/jira/browse/OOZIE-994
>             Project: Oozie
>          Issue Type: Bug
>          Components: workflow
>    Affects Versions: 3.2.0
>            Reporter: Alejandro Abdelnur
>            Assignee: Robert Kanter
>            Priority: Critical
>             Fix For: trunk
>
>         Attachments: OOZIE-994.patch, OOZIE-994.patch, OOZIE-994.patch
>
>
> If the JT restarts or dies and running jobs are lost or the JT is not reachable, Oozie ActionCheckXCommand will never fail the workflow job.
> There seem to be 2 issues here:
> * convertException is not receiving the root cause exception anytmore, but alway HadoopAccessorException wrapping the root cause exception. We should modify the convertException to inspect the cause exception as well.
> * ActionCheckXCommand does not do the handle retry logic of ActionStartXCommand.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira