You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oozie.apache.org by Shangzhong zhu <sh...@gmail.com> on 2016/07/07 06:57:31 UTC
Oozie workflow action report JA009 EOFException on Sequence file
I am running a Oozie coordinator job (frequency: 15 mins) which
occasionally missed SLA due to the following error:
1. 2016-07-03 05:49:35,377 WARN ActionCheckXCommand:544 - SERVER[node75-
144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
160627222917756-oozie-oozi-W@pin-translation_wf] Exception while
executing check(). Error Code [JA009], Message[JA009: null]
2.
3. org.apache.oozie.action.ActionExecutorException: JA009: null
4. at org.apache.oozie.action.ActionExecutor.convertExceptionHelper(
ActionExecutor.java:418)
5. at org.apache.oozie.action.ActionExecutor.convertException(
ActionExecutor.java:396)
6. at org.apache.oozie.action.hadoop.JavaActionExecutor.check(
JavaActionExecutor.java:1296)
7. at org.apache.oozie.command.wf.ActionCheckXCommand.execute(
ActionCheckXCommand.java:181)
8. at org.apache.oozie.command.wf.ActionCheckXCommand.execute(
ActionCheckXCommand.java:55)
9. at org.apache.oozie.command.XCommand.call(XCommand.java:281)
10. at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(
CallableQueueService.java:174)
11. at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
12. at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
13. at java.lang.Thread.run(Thread.java:745)
14. Caused by: java.io.EOFException
15. at java.io.DataInputStream.readFully(DataInputStream.java:197)
16. at java.io.DataInputStream.readFully(DataInputStream.java:169)
17. at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:
1848)
18. at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.
java:1813)
19. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:
1762)
20. at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:
1776)
21. at org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(
LauncherMapperHelper.java:270)
22. at org.apache.oozie.action.hadoop.LauncherMapperHelper$1.run(
LauncherMapperHelper.java:264)
23. at java.security.AccessController.doPrivileged(Native Method)
24. at javax.security.auth.Subject.doAs(Subject.java:415)
25. at org.apache.hadoop.security.UserGroupInformation.doAs(
UserGroupInformation.java:1657)
26. at org.apache.oozie.action.hadoop.LauncherMapperHelper.getActionData(
LauncherMapperHelper.java:264)
27. at org.apache.oozie.action.hadoop.JavaActionExecutor.check(
JavaActionExecutor.java:1207)
28. ... 7 more
The workflow action is an JAVA action which internally generates MR jobs.
Oozie version: 4.1.0. Hadoop version: 2.7.0.
My understanding is that, JavaActionExceutor failed in opening the Sequence
file action-data.seq (EOFException). But I don't know why such error occurs.
The error will trigger Oozie to retry for a few times, eventually the
workflow action will be suspended.
1. 2016-07-03 05:49:35,377 INFO ActionCheckXCommand:541 - SERVER[node75-
144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
160627222917756-oozie-oozi-W@pin-translation_wf] Next Retry, Attempt
Number [1] in [60,000] milliseconds
2.
3. 2016-07-03 05:50:35,496 WARN JavaActionExecutor:544 - SERVER[node75-
144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
160627222917756-oozie-oozi-W@pin-translation_wf] Exception in check().
Message[null]
4. ....
5.
6. 2016-07-03 05:52:35,785 WARN ActionCheckXCommand:544 - SERVER[node75-
144.prod-aws.xx.yy.com] USER[hadoop] GROUP[-] TOKEN[] APP[PIN-Translation
] JOB[0096164-160627222917756-oozie-oozi-W] ACTION[0096164-
160627222917756-oozie-oozi-W@pin-translation_wf] Suspending Workflow Job
id=0096164-160627222917756-oozie-oozi-W
However, Oozie will automatically re-execute the workflow. And when
re-executing happen, the action will succeed.
Can someone share any insight into this issue?
Thanks,
Shanzhong