You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Eagles (JIRA)" <ji...@apache.org> on 2014/06/20 17:25:25 UTC

[jira] [Commented] (YARN-2184) ResourceManager may fail due to name node in safe mode

    [ https://issues.apache.org/jira/browse/YARN-2184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14038912#comment-14038912 ] 

Jonathan Eagles commented on YARN-2184:
---------------------------------------

Jeff, This issue has already be reported under YARN-2035 by me and there is a patch available. Let me know if this solves your issue and we can close this ticket out.

> ResourceManager may fail due to name node in safe mode
> ------------------------------------------------------
>
>                 Key: YARN-2184
>                 URL: https://issues.apache.org/jira/browse/YARN-2184
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: resourcemanager
>    Affects Versions: 2.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> If the historyservice is enabled in resourcemanager, it will try to mkdir when service is inited. And at that time maybe the name node is still in safemode which may cause the historyservice failed and then cause the resouremanager fail. It would be very possible when the cluster is restarted when namenode will be in safemode in a long time.
> Here's the error logs:
> {code}
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /Users/jzhang/Java/lib/hadoop-2.4.0/logs/yarn/system/history/ApplicationHistoryDataRoot. Name node is in safe mode.
> The reported blocks 85 has reached the threshold 0.9990 of total blocks 85. The number of live datanodes 1 has reached the minimum number 0. In safe mode extension. Safe mode will be turned off automatically in 19 seconds.
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1195)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3564)
>     at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3540)
>     at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:754)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:558)
>     at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
>     at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
>     at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:415)
>     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>     at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1363)
>     at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
>     at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>     at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:606)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
>     at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
>     at com.sun.proxy.$Proxy14.mkdirs(Unknown Source)
>     at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:500)
>     at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2553)
>     at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2524)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:827)
>     at org.apache.hadoop.hdfs.DistributedFileSystem$16.doCall(DistributedFileSystem.java:823)
>     at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:823)
>     at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:816)
>     at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1815)
>     at org.apache.hadoop.yarn.server.applicationhistoryservice.FileSystemApplicationHistoryStore.serviceInit(FileSystemApplicationHistoryStore.java:120)
>     at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
>     ... 10 more
> 2014-06-20 11:06:25,220 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: SHUTDOWN_MSG:
> /************************************************************
> SHUTDOWN_MSG: Shutting down ResourceManager at jzhangMBPr.local/192.168.100.152
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)