You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tez.apache.org by "Yesha Vora (JIRA)" <ji...@apache.org> on 2013/11/20 03:25:23 UTC

[jira] [Updated] (TEZ-628) GridMix job failed with finalStatus='Killed' due to NullPointerException when one of the NMs went bad

     [ https://issues.apache.org/jira/browse/TEZ-628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yesha Vora updated TEZ-628:
---------------------------

    Description: 
GRIDMIX000162 failed with final status = "Killed".

Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.

AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread. 
-------------------------
724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
	at java.lang.Thread.run(Thread.java:662)
2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
------------------------------
At the same time , one of the NMs died due to "java.io.IOException: No space left on device"
------------------------------
2013-11-20 00:00:54,353 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097 java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed         at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061)         at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716)         at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) 2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error.  Shutting down now... org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238)         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)         at java.io.FilterOutputStream.close(FilterOutputStream.java:140)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)         at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) Caused by: java.io.IOException: No space left on device         at java.io.FileOutputStream.writeBytes(Native Method)         at java.io.FileOutputStream.write(FileOutputStream.java:282)         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)         ... 16 more 2013-11-20 00:00:54,524 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2013-11-20 00:00:54,530 INFO  mortbay.log 
--------------------------

  was:
GRIDMIX000162 failed with final status = "Killed".

Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.

AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread. 

724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.NullPointerException
	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
	at java.lang.Thread.run(Thread.java:662)
2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler

At the same time , one of the NMs died due to "java.io.IOException: No space left on device"

2013-11-20 00:00:54,353 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097 java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed         at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061)         at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716)         at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) 2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error.  Shutting down now... org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238)         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)         at java.io.FilterOutputStream.close(FilterOutputStream.java:140)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)         at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) Caused by: java.io.IOException: No space left on device         at java.io.FileOutputStream.writeBytes(Native Method)         at java.io.FileOutputStream.write(FileOutputStream.java:282)         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)         ... 16 more 2013-11-20 00:00:54,524 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2013-11-20 00:00:54,530 INFO  mortbay.log 


> GridMix job failed with finalStatus='Killed' due to NullPointerException when one of the NMs went bad
> -----------------------------------------------------------------------------------------------------
>
>                 Key: TEZ-628
>                 URL: https://issues.apache.org/jira/browse/TEZ-628
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Yesha Vora
>
> GRIDMIX000162 failed with final status = "Killed".
> Note: RM reuse feature has been disabled.While job was running one of the NMs went bad.
> AM log shows Null pointer Exception at org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread. 
> -------------------------
> 724 INFO [AsyncDispatcher event handler] org.apache.tez.dag.app.rm.container.AMContainerImpl: AMContainer container_1384820885084_0165_01_000004 transitioned from STOPPING to COMPLETED via event C_COMPLETED
> 2013-11-20 00:10:59,724 INFO [TaskSchedulerEventHandlerThread] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: Processing the event EventType: S_CONTAINER_COMPLETED
> 2013-11-20 00:11:26,976 FATAL [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
> java.lang.NullPointerException
> 	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:145)
> 	at org.apache.tez.dag.app.rm.node.AMNodeMap.handle(AMNodeMap.java:39)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:134)
> 	at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:81)
> 	at java.lang.Thread.run(Thread.java:662)
> 2013-11-20 00:11:26,977 INFO [AsyncDispatcher event handler] org.apache.hadoop.yarn.event.AsyncDispatcher: Exiting, bbye..
> 2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.DAGAppMaster: DAGAppMaster received a signal. Signaling TaskScheduler
> 2013-11-20 00:11:26,978 INFO [Thread-2] org.apache.tez.dag.app.rm.TaskSchedulerEventHandler: TaskScheduler notified that iSignalled was : true
> 2013-11-20 00:11:26,979 INFO [Thread-2] org.apache.tez.dag.history.HistoryEventHandler: Stopping HistoryEventHandler
> ------------------------------
> At the same time , one of the NMs died due to "java.io.IOException: No space left on device"
> ------------------------------
> 2013-11-20 00:00:54,353 WARN  nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createAppLogDirs(548)) - Unable to create the app-log directory : /tmp/yarn/log/application_1384820885084_0097 java.io.IOException: mkdir of /tmp/yarn/log/application_1384820885084_0097 failed         at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:1061)         at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:150)         at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:187)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:720)         at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:716)         at org.apache.hadoop.fs.FSLinkResolver.resolve(FSLinkResolver.java:90)         at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:716)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createDir(DefaultContainerExecutor.java:425)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.createAppLogDirs(DefaultContainerExecutor.java:546)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:95)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) 2013-11-20 00:00:54,522 FATAL yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(51)) - Thread Thread[LocalizerRunner for container_1384820885084_0097_01_000003,5,main] threw an Error.  Shutting down now... org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:238)         at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)         at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)         at java.io.FilterOutputStream.close(FilterOutputStream.java:140)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.fs.ChecksumFs$ChecksumFSOutputSummer.close(ChecksumFs.java:364)         at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:71)         at org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:104)         at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)         at org.apache.hadoop.io.IOUtils.closeStream(IOUtils.java:254)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:61)         at org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:112)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2168)         at org.apache.hadoop.fs.FileContext$Util.copy(FileContext.java:2109)         at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:102)         at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:977) Caused by: java.io.IOException: No space left on device         at java.io.FileOutputStream.writeBytes(Native Method)         at java.io.FileOutputStream.write(FileOutputStream.java:282)         at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:236)         ... 16 more 2013-11-20 00:00:54,524 INFO  util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status -1 2013-11-20 00:00:54,530 INFO  mortbay.log 
> --------------------------



--
This message was sent by Atlassian JIRA
(v6.1#6144)