You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Hadoop QA (JIRA)" <ji...@apache.org> on 2016/04/01 12:51:25 UTC
[jira] [Commented] (OOZIE-2495) change action status from
ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED
occasionally
[ https://issues.apache.org/jira/browse/OOZIE-2495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221529#comment-15221529 ]
Hadoop QA commented on OOZIE-2495:
----------------------------------
Testing JIRA OOZIE-2495
Cleaning local git workspace
----------------------------
{color:red}-1{color} Patch failed to apply to head of branch
----------------------------
> change action status from ErrorType.NON_TRANSIENT to TRANSIENT when SSH action occurs AUTH_FAILED occasionally
> ---------------------------------------------------------------------------------------------------------------
>
> Key: OOZIE-2495
> URL: https://issues.apache.org/jira/browse/OOZIE-2495
> Project: Oozie
> Issue Type: Improvement
> Components: action
> Affects Versions: 4.2.0
> Reporter: WangMeng
> Attachments: OOZIE-2495.01.patch
>
>
> For SSH action ,sometimes it failed with the following exception :
>
> AUTH_FAILED: Not able to perform operation [ssh -o PasswordAuthentication=no -o KbdInteractiveDevices=no -o StrictHostKeyChecking=no -o ConnectTimeout=20 user@XXX.XX.XX.XXX mkdir -p oozie-oozi/0000067-130808155814753-oozie-oozi-W/sshjob--ssh/ ] | EErrorStream: Warning: Permanently added (RSA) to the list of known hosts.
>
> However , when I execute the same ssh command by hand in Oozie server host , it worked.
>
> Except incorrect ssh settings , the reason causing the exception may also be SSH client load is too high when connect, network jitter or others.
> Once connect failed, regardless of retry times, oozie will change its status to ErrorType.NON_TRANSIENT and suspend this action right now.
> When it occurs ,I think changing the action status from ErrorType.NON_TRANSIENT to TRANSIENT may be better , this can let action retry automaticly before it be suspended, which can deal with occasionally connect error .
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)