You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by Srivastava Rachna - rasriv <Ra...@acxiom.com> on 2014/10/09 15:32:44 UTC

What are the causes of Oozie failure at the Prep State

Hi,

I am trying to test a simple mapreduce sample workflow, the action is stuck in Prep phase.  When I run the same mapreduce program outside oozie it works fine.  No dashboards logs are generated.  Do not see any error.

Excerpt from oozie-cmf-oozie1-OOZIE_SERVER-localhost.localdomain.log.out.  
2014-10-09 06:16:26,678 WARN org.apache.hadoop.security.authentication.server.AuthenticationFilter: AuthenticationToken ignored: AuthenticationToken expired
2014-10-09 06:16:58,136 INFO org.apache.oozie.service.CoordMaterializeTriggerService$CoordMaterializeTriggerRunnable: USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION
[-] CoordMaterializeTriggerService - Curr Date= Thu Oct 09 06:21:58 PDT 2014, Num jobs to materialize = 0
2014-10-09 06:16:59,876 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.Statu
sTransitService]
2014-10-09 06:16:59,876 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running coordinator status service from last inst
ance time =  2014-10-08T21:30Z
2014-10-09 06:16:59,879 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running bundle status service from last instance 
time =  2014-10-08T21:30Z
2014-10-09 06:16:59,880 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Released lock for [org.apache.oozie.service.Statu
sTransitService]
2014-10-09 06:17:02,577 INFO org.apache.oozie.service.PauseTransitService: USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2014-10-09 06:17:02,584 INFO org.apache.oozie.service.PauseTransitService: USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTransitService]
2014-10-09 06:17:59,880 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.Statu
sTransitService]
2014-10-09 06:17:59,881 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running coordinator status 
service from last instance time =  2014-10-09T13:16Z
2014-10-09 06:17:59,883 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running bundle status service from last instance time =  2014-10-09T13:16Z
2014-10-09 06:17:59,885 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Released lock for [org.apache.oozie.service.StatusTransitService]
2014-10-09 06:18:02,584 INFO org.apache.oozie.service.PauseTransitService: USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.PauseTransitService]
2014-10-09 06:18:02,594 INFO org.apache.oozie.service.PauseTransitService: USER[-] GROUP[-] Released lock for [org.apache.oozie.service.PauseTransitService]
2014-10-09 06:18:59,885 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Acquired lock for [org.apache.oozie.service.StatusTransitService]
2014-10-09 06:18:59,885 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running coordinator status service from last instance time =  2014-10-09T13:17Z
2014-10-09 06:18:59,887 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Running bundle status service from last instance time =  2014-10-09T13:17Z
2014-10-09 06:18:59,889 INFO org.apache.oozie.service.StatusTransitService$StatusTransitRunnable: USER[-] GROUP[-] Released lock for [org.apache.oozie.service.StatusTransitService]

Catalina server has started
Oct 8, 2014 8:01:17 AM org.apache.coyote.http11.Http11Protocol start
INFO: Starting Coyote HTTP/1.1 on http-11000
Oct 8, 2014 8:01:17 AM org.apache.catalina.startup.Catalina start
INFO: Server startup in 12244 ms

Job.properties
[cloudera@localhost oozieProject]$ cat job.properties
nameNode=hdfs\://localhost.localdomain\:8020
jobTracker=localhost.localdomain\:8021
queueName=default
oozie.use.system.libpath=true
oozieProjectRoot=${nameNode}/user/${user.name}/oozieProject
oozie.wf.application.path=${oozieProjectRoot}/
outputDir=oozieProject

workflow.xml
[cloudera@localhost oozieProject]$ cat workflow.xml 
<workflow-app xmlns="uri:oozie:workflow:0.2" name="java-main-wf">
    <start to="java-node-one"/>
    <action name="java-node-one">
        <java>
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <main-class>com.acxiom.oozieproject.ChangeCase</main-class>
        </java>
        <ok to="end"/>
        <error to="fail"/>
    </action>
    <kill name="fail">
        <message>Java failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
    </kill>
    <end name="end"/>

Command to invoke oozie

hadoop jar oozieProject/lib/LogEventCount.jar  sample.LogEventCount oozieProject/input/testdata oozieProject/output

only error I could find under /var/log/oozie are these:
[cloudera@localhost oozie]$ grep ERROR *
oozie-audit.log:2014-10-08 06:54:13,005  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [java-main-wf], JOBID [0000000-141007184507168-oozie-oozi-W], OPERATION [kill], PARAMETER [0000000-141007184507168-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 07:04:32,688  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [pig-app-hue-script], JOBID [0000002-141007184507168-oozie-oozi-W], OPERATION [start], PARAMETER [0000002-141007184507168-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 07:09:51,035  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [java-main-wf], JOBID [0000001-141007184507168-oozie-oozi-W], OPERATION [kill], PARAMETER [0000001-141007184507168-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 07:42:21,419  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [WorkflowJavaMainAction], JOBID [0000003-141007184507168-oozie-oozi-W], OPERATION [kill], PARAMETER [0000003-141007184507168-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 07:58:56,599  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [WorkflowJavaMainAction], JOBID [0000004-141007184507168-oozie-oozi-W], OPERATION [kill], PARAMETER [0000004-141007184507168-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 08:58:40,822  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [WorkflowJavaMainAction], JOBID [0000000-141008080106562-oozie-oozi-W], OPERATION [kill], PARAMETER [0000000-141008080106562-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 13:14:53,245  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [WorkFlowJavaMapReduceAction], JOBID [0000001-141008080106562-oozie-oozi-W], OPERATION [kill], PARAMETER [0000001-141008080106562-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 13:29:41,458  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [WorkFlowJavaMapReduceAction], JOBID [0000002-141008080106562-oozie-oozi-W], OPERATION [kill], PARAMETER [0000002-141008080106562-oozie-oozi-W], STATUS [SUCCESS], HTTPCODE [200], ERRORCODE [null], ERRORMESSAGE [null]
oozie-audit.log:2014-10-08 14:08:39,993  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [null], JOBID [null], OPERATION [start], PARAMETER [null], STATUS [FAILED], HTTPCODE [401], ERRORCODE [E0901], ERRORMESSAGE [E0901: Namenode [debian:8020] not allowed, not in Oozies whitelist]
oozie-audit.log:2014-10-08 14:11:00,746  INFO oozieaudit:539 - USER [cloudera], GROUP [null], APP [null], JOBID [null], OPERATION [start], PARAMETER [null], STATUS [FAILED], HTTPCODE [401], ERRORCODE [E0504], ERRORMESSAGE [E0504: App directory [hdfs://localhost.localdomain:8020/workflows/oozie-examples] does not exist]
oozie-cmf-oozie1-OOZIE_SERVER-localhost.localdomain.log.out.2014-10-08-07:2014-10-08 07:05:01,549 WARN org.apache.oozie.action.hadoop.PigActionExecutor: USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000002-141007184507168-oozie-oozi-W] ACTION[0000002-141007184507168-oozie-oozi-W@pig] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2]
oozie-cmf-oozie1-OOZIE_SERVER-localhost.localdomain.log.out.2014-10-08-07:2014-10-08 07:05:01,627 INFO org.apache.oozie.command.wf.ActionEndXCommand: USER[cloudera] GROUP[-] TOKEN[] APP[pig-app-hue-script] JOB[0000002-141007184507168-oozie-oozi-W] ACTION[0000002-141007184507168-oozie-oozi-W@pig] ERROR is considered as FAILED for SLA
[cloudera@localhost oozie]$

Thanks for your input.

Rachana

***************************************************************************
The information contained in this communication is confidential, is
intended only for the use of the recipient named above, and may be legally
privileged.

If the reader of this message is not the intended recipient, you are
hereby notified that any dissemination, distribution or copying of this
communication is strictly prohibited.

If you have received this communication in error, please resend this
communication to the sender and delete the original message or any copy
of it from your computer system.

Thank You.
****************************************************************************