You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Robert Kanter (JIRA)" <ji...@apache.org> on 2015/02/02 20:30:35 UTC

[jira] [Commented] (OOZIE-2126) SSH action can be too fast for Oozie sometimes

    [ https://issues.apache.org/jira/browse/OOZIE-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14301750#comment-14301750 ] 

Robert Kanter commented on OOZIE-2126:
--------------------------------------

We ran into a similar problem with the shell action in the Oozie on YARN prototype I posted in OOZIE-1770 because the shell action running in a container would return much faster than the shell action running in a launcher job.

> SSH action can be too fast for Oozie sometimes
> ----------------------------------------------
>
>                 Key: OOZIE-2126
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2126
>             Project: Oozie
>          Issue Type: Bug
>          Components: action
>            Reporter: Robert Kanter
>            Assignee: Robert Kanter
>
> We've seen a timing problem with the SSH action where the callback comes back too fast, before the action has transitioned to RUNNING and is still in PREP.  This causes Oozie to ignore the callback, which means it won't find out that the action completed until it manually checks (default=10min).  This happened in an HA setup, but I think it could happen even without HA.  Adding a 30 second delay into the ssh scripts fixed the problem, but ideally we should come up with a better solution.
> Here's the relevant logs:
> {noformat}
> 2015-01-16 18:00:12,916 INFO org.apache.oozie.action.ssh.SshActionExecutor: SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] start() begins
> 2015-01-16 18:00:12,917 INFO org.apache.oozie.action.ssh.SshActionExecutor: SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] Attempting to copy ssh base scripts to remote host [foo@bar.com]
> 2015-01-16 18:00:15,769 INFO org.apache.oozie.servlet.CallbackServlet: SERVER[FOO] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] callback for action [0000027-150113223634420-oozie-oozi-W@action-1]
> 2015-01-16 18:00:15,774 ERROR org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[FOO] USER[-] GROUP[-] TOKEN[] APP[-] JOB[0000027-150113223634420-oozie-oozi-W] ACTION[0000027-150113223634420-oozie-oozi-W@action-1] XException,
> org.apache.oozie.command.CommandException: E0800: Action it is not running its in [PREP] state, action [0000027-150113223634420-oozie-oozi-W@action-1]
>         at org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:77)
>         at org.apache.oozie.command.XCommand.call(XCommand.java:251)
>         at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)