You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2016/04/01 12:51:25 UTC

[jira] [Commented] (OOZIE-2495) change action status from ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED occasionally

    [ https://issues.apache.org/jira/browse/OOZIE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221529#comment-15221529 ] 

Hadoop QA commented on OOZIE-2495:
----------------------------------

Testing JIRA OOZIE-2495

Cleaning local git workspace

----------------------------

{color:red}-1{color} Patch failed to apply to head of branch

----------------------------

> change action status from  ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED occasionally
> ---------------------------------------------------------------------------------------------------------------
>
>                 Key: OOZIE-2495
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2495
>             Project: Oozie
>          Issue Type: Improvement
>          Components: action
>    Affects Versions: 4.2.0
>            Reporter: WangMeng
>         Attachments: OOZIE-2495.01.patch
>
>
>     For SSH action ,sometimes it failed  with the following exception :
>    
>     AUTH_FAILED: Not able to perform operation [ssh -o   PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@XXX.XX.XX.XXX mkdir -p oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: Warning: Permanently added (RSA) to the list of known hosts. 
>     
>     However , when I execute the same ssh command  by hand  in Oozie server host , it worked.
>     
>     Except  incorrect ssh settings , the reason causing the exception may also be SSH client load is too high when connect, network jitter or others. 
>     Once connect failed, regardless of retry times, oozie will change its status to  ErrorType.NON_TRANSIENT and suspend this action right now.
>     When it occurs ,I think changing the action status from  ErrorType.NON_TRANSIENT to TRANSIENT may be better , this can let action retry automaticly before it be suspended, which can deal with occasionally connect error .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)