You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Shangzhong zhu <sh...@gmail.com> on 2016/07/07 06:57:31 UTC

Oozie workflow action report JA009 EOFException on Sequence file

I am running a Oozie coordinator job (frequency: 15 mins) which
occasionally missed SLA due to the following error:


   1. 2016-07-03 05:49:35,377 WARN ActionCheckXCommand:544 - SERVER[node75-
   144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
   ] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
   160627222917756-oozie-oozi-W@pin-translation_wf] Exception while
   executing check(). Error Code [JA009], Message[JA009: null]
   2.
   3. org.apache.oozie.action.ActionExecutorException: JA009: null
   4. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(
   ActionExecutor.java:418)
   5. at org.apache.oozie.action.ActionExecutor.convertException(
   ActionExecutor.java:396)
   6. at org.apache.oozie.action.hadoop.JavaActionExecutor.check(
   JavaActionExecutor.java:1296)
   7. at org.apache.oozie.command.wf.ActionCheckXCommand.execute(
   ActionCheckXCommand.java:181)
   8. at org.apache.oozie.command.wf.ActionCheckXCommand.execute(
   ActionCheckXCommand.java:55)
   9. at org.apache.oozie.command.XCommand.call(XCommand.java:281)
   10. at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
   CallableQueueService.java:174)
   11. at java.util.concurrent.ThreadPoolExecutor.runWorker(
   ThreadPoolExecutor.java:1145)
   12. at java.util.concurrent.ThreadPoolExecutor$Worker.run(
   ThreadPoolExecutor.java:615)
   13. at java.lang.Thread.run(Thread.java:745)
   14. Caused by: java.io.EOFException
   15. at java.io.DataInputStream.readFully(DataInputStream.java:197)
   16. at java.io.DataInputStream.readFully(DataInputStream.java:169)
   17. at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:
   1848)
   18. at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.
   java:1813)
   19. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:
   1762)
   20. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:
   1776)
   21. at org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(
   LauncherMapperHelper.java:270)
   22. at org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(
   LauncherMapperHelper.java:264)
   23. at java.security.AccessController.doPrivileged(Native Method)
   24. at javax.security.auth.Subject.doAs(Subject.java:415)
   25. at org.apache.hadoop.security.UserGroupInformation.doAs(
   UserGroupInformation.java:1657)
   26. at org.apache.oozie.action.hadoop.LauncherMapperHelper.getActionData(
   LauncherMapperHelper.java:264)
   27. at org.apache.oozie.action.hadoop.JavaActionExecutor.check(
   JavaActionExecutor.java:1207)
   28. ... 7 more


The workflow action is an JAVA action which internally generates MR jobs.
Oozie version: 4.1.0. Hadoop version: 2.7.0.

My understanding is that, JavaActionExceutor failed in opening the Sequence
file action-data.seq (EOFException). But I don't know why such error occurs.

The error will trigger Oozie to retry for a few times, eventually the
workflow action will be suspended.



   1. 2016-07-03 05:49:35,377 INFO ActionCheckXCommand:541 - SERVER[node75-
   144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
   ] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
   160627222917756-oozie-oozi-W@pin-translation_wf] Next Retry, Attempt
   Number [1] in [60,000] milliseconds
   2.
   3. 2016-07-03 05:50:35,496 WARN JavaActionExecutor:544 - SERVER[node75-
   144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
   ] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
   160627222917756-oozie-oozi-W@pin-translation_wf] Exception in check().
   Message[null]
   4. ....
   5.
   6. 2016-07-03 05:52:35,785 WARN ActionCheckXCommand:544 - SERVER[node75-
   144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
   ] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
   160627222917756-oozie-oozi-W@pin-translation_wf] Suspending Workflow Job
   id=0096164-160627222917756-oozie-oozi-W


However, Oozie will automatically re-execute the workflow. And when
re-executing happen, the action will succeed.

Can someone share any insight into this issue?

Thanks,

Shanzhong