You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Neill Lima (JIRA)" <ji...@apache.org> on 2015/02/06 14:18:34 UTC
[jira] [Created] (HADOOP-11555) Missing hadoop exclude file fails
RMs in HA
Neill Lima created HADOOP-11555:
-----------------------------------
Summary: Missing hadoop exclude file fails RMs in HA
Key: HADOOP-11555
URL: https://issues.apache.org/jira/browse/HADOOP-11555
Project: Hadoop Common
Issue Type: Bug
Components: ha
Affects Versions: 2.6.0
Environment: Debian 7
Reporter: Neill Lima
I have two NNs in HA, they do not fail when the exclude file is not present (hadoop-2.6.0/etc/hadoop/exclude). I had one RM and I wanted to make two in HA. I didn't create the exclude file at this point as well. I applied the HA RM settings properly and when I started both RMs I started getting this exception:
2015-02-06 12:25:25,326 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=root OPERATION=transitionToActive TARGET=RMHAProtocolService RESULT=FAILURE DESCRIPTION=Exception transitioning to active PERMISSIONS=All users are allowed
2015-02-06 12:25:25,326 WARN org.apache.hadoop.ha.ActiveStandbyElector: Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:128)
at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:805)
at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:416)
at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:304)
at org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
... 4 more
Caused by: org.apache.hadoop.ha.ServiceFailedException: java.io.FileNotFoundException: /hadoop-2.6.0/etc/hadoop/exclude (No such file or directory)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:626)
at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:297)
... 5 more
2015-02-06 12:25:25,327 INFO org.apache.hadoop.ha.ActiveStandbyElector: Trying to re-establish ZK session
2015-02-06 12:25:25,339 INFO org.apache.zookeeper.ZooKeeper: Session: 0x44af32566180094 closed
2015-02-06 12:25:26,340 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=x.x.x.x:2181,x.x.x.x:2181 sessionTimeout=10000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@307587c
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server x.x.x.x/x.x.x.x:2181. Will not attempt to authenticate using SASL (unknown error)
2015-02-06 12:25:26,341 INFO org.apache.zookeeper.ClientCnxn: Socket connection established to x.x.x.x/x.x.x.x:2181, initiating session
The issue is descriptive enough to resolve the problem - and it has been fixed by creating the exclude file.
I just think as of a improvement:
- Should RMs ignore the missing file as the NNs did?
- Should single RM fail even when the file is not present?
Just suggesting this improvement to keep the behavior consistent when working with in HA (both NNs and RMs).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)