You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Denes Bodo (JIRA)" <ji...@apache.org> on 2017/10/25 13:34:00 UTC

[jira] [Commented] (OOZIE-2985) If LauncherAM fails, Oozie is not notified in a timely manner

    [ https://issues.apache.org/jira/browse/OOZIE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218630#comment-16218630 ] 

Denes Bodo commented on OOZIE-2985:
-----------------------------------

I have experienced the same issue when tried to run the example distcp acrtion. Both the ClassNotFoundException for LauncherAM and the long wait.

> If LauncherAM fails, Oozie is not notified in a timely manner
> -------------------------------------------------------------
>
>                 Key: OOZIE-2985
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2985
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Attila Sasvari
>
> I've noticed if LauncherAM fails, Oozie is notified about the launcher's failure with a lot of delay. It gives the impression that the workflow is running.
> {{oozie job -oozie http://localhost:11000/oozie -config examples/apps/datelist-java-main/job.properties  -info  0000000-170712153835057-oozie-asas-W}}
> {code}
> 0000000-170712153835057-oozie-asas-W@java1                                    RUNNING   application_1499866588585_0001RUNNING    -         
> {code}
> I've looked at yarn logs for the launcher and seen that the launcher failed. For example, in my case , during development, oozie-sharelib launcher was not found:  
> {code}
> Error: Could not find or load main class org.apache.oozie.action.hadoop.LauncherAM
> {code}
> The problem is only after the specified timeout (by default 10 minutes) we see that the workflow has actually failed /errored.
> {code}
> Created       : 2017-07-12 13:38 GMT
> Started       : 2017-07-12 13:38 GMT
> Last Modified : 2017-07-12 13:49 GMT
> ...
> 0000000-170712153835057-oozie-asas-W@java1                                    ERROR     application_1499866588585_0001FAILED/KILLED-         
> {code} 
> The problem might be that in {{JavaActionExecutor}} in the {{start()}} method the check is too fast.
> {code}
> LOG.debug("Starting action " + action.getId() + " getting Action File System");
>             FileSystem actionFs = context.getAppFileSystem();
>             LOG.debug("Preparing action Dir through copying " + context.getActionDir());
>             prepareActionDir(actionFs, context);
>             LOG.debug("Action Dir is ready. Submitting the action ");
>             submitLauncher(actionFs, context, action);
>             LOG.debug("Action submit completed. Performing check ");
>             check(context, action);
>             LOG.debug("Action check is done after submission
> {code}
> There should be some delay after {{submitLauncher()}} before {{check()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)