You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Siddharth Seth (JIRA)" <ji...@apache.org> on 2013/11/20 03:39:21 UTC
[jira] [Resolved] (TEZ-628) GridMix job failed with
finalStatus='Killed' due to NullPointerException when one of the NMs went
bad
[ https://issues.apache.org/jira/browse/TEZ-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siddharth Seth resolved TEZ-628.
--------------------------------
Resolution: Fixed
Fix Version/s: 0.2.0
Committed to master.
> GridMix job failed with finalStatus='Killed' due to NullPointerException when one of the NMs went bad
> -----------------------------------------------------------------------------------------------------
>
> Key: TEZ-628
> URL: https://issues.apache.org/jira/browse/TEZ-628
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Yesha Vora
> Assignee: Siddharth Seth
> Fix For: 0.2.0
>
> Attachments: TEZ-628.txt
>
>
> GRIDMIX000162 failed with final status = "Killed".
> Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.
> AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread.
> -------------------------
> 724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
> 2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
> 2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
> at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
> at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
> at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
> at java.lang.Thread.run(Thread.java:662)
> 2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> 2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
> 2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
> 2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
> ------------------------------
> At the same time , one of the NMs died due to "java.io.IOException: No space left on device"
> ------------------------------
> 2013-11-20 00:00:54,353 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097 java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061) at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150) at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720) at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716) at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90) at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) 2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error. Shutting down now... org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123) at java.io.FilterOutputStream.close(FilterOutputStream.java:140) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104) at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364) at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71) at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104) at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237) at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61) at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168) at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102) at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) Caused by: java.io.IOException: No space left on device at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:282) at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236) ... 16 more 2013-11-20 00:00:54,524 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2013-11-20 00:00:54,530 INFO mortbay.log
> --------------------------
--
This message was sent by Atlassian JIRA
(v6.1#6144)