You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Sandy Ryza (JIRA)" <ji...@apache.org> on 2013/02/22 00:24:12 UTC

[jira] [Commented] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS

    [ https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583660#comment-13583660 ] 

Sandy Ryza commented on YARN-413:
---------------------------------

2013-02-21 13:27:24,307 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager
	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322)
	at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359)
Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl
	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248)
	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
	... 3 more
Caused by: org.apache.hadoop.yarn.YarnException: Failed to create remoteLogDir [/tmp/logs]
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:207)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132)
	at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68)
	... 5 more
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/logs. Name node is in safe mode.
The reported blocks 7 has reached the threshold 0.9990 of total blocks 7. Safe mode will be turned off automatically in 25 seconds.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3067)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3045)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3024)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:667)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:468)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40995)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:482)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1778)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1774)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1772)

	at org.apache.hadoop.ipc.Client.call(Client.java:1237)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy9.mkdirs(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:163)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:82)
	at $Proxy9.mkdirs(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:450)
	at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2115)
	at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2086)
	at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:540)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:204)
	... 7 more
2013-02-21 13:27:24,308 INFO org.apache.hadoop.ipc.Server: Stopping server on 47223
2013-02-21 13:27:24,308 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
2013-02-21 13:27:24,309 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
                
> With log aggregation on, nodemanager dies on startup if it can't connect to HDFS
> --------------------------------------------------------------------------------
>
>                 Key: YARN-413
>                 URL: https://issues.apache.org/jira/browse/YARN-413
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.3-alpha
>            Reporter: Sandy Ryza
>
> If log aggregation is on, when the nodemanager starts up, it tries to create the remote log directory.  If this fails, it kills itself.  It doesn't seem like turning log aggregation on should ever cause the nodemanager to die.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira