You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by Dinesh Kumar Prabakaran <di...@gmail.com> on 2016/09/15 11:38:22 UTC

After rolling upgrade Resource Manager does not turn to active state.

I have a 3 node running HA cluster with the following configuration.
Machine 1 - Name Node 1 - Resource Manager 1
Machine 2 - Name Node 2 - Resource Manager 2
Machine 3 - Data Node 1 - Node Manager 1

Performed Rolling upgrade Hadoop cluster from 2.5.2 to 2.7.2 version and
finalized the same. All other services started properly. Resource Manager
alone does not turned to active state at all. This could be strange but the
issue occurs only if I have submitted any jobs(say MR) in my old version
2.5.2; My old version Hadoop cluster had no issues.

Disabled automatic failover and tried manual failover too. Same error.

Here is sample resource manager logs:
2016-09-12 01:03:50,888 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Trying to re-establish ZK session
2016-09-12 01:03:50,907 INFO org.apache.zookeeper.ZooKeeper: Session:
0x1571d6758d500e7 closed
2016-09-12 01:03:51,908 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection,
connectString=rootserver25.root.dinesh.lan:2181,rootserver33.root.dinesh.lan:2181,rootserver36.root.dinesh.lan:2181
sessionTimeout=10000
watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@45d89439
2016-09-12 01:03:51,909 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server
rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181. Will
not attempt to authenticate using SASL (unknown error)
2016-09-12 01:03:51,909 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to
rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181,
initiating session
2016-09-12 01:03:51,940 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server
rootserver25.root.dinesh.lan/fe80:0:0:0:f9a7:4fa1:76de:c8eb%6:2181,
sessionid = 0x1571d6758d500e8, negotiated timeout = 10000
2016-09-12 01:03:51,941 INFO org.apache.zookeeper.ClientCnxn: EventThread
shut down
2016-09-12 01:03:51,941 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Session connected.
2016-09-12 01:03:52,014 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Checking for any old active which needs to be fenced...
2016-09-12 01:03:52,015 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old
node exists: 0a0b73796e63636c757374657212067265736d7232
2016-09-12 01:03:52,015 INFO org.apache.hadoop.ha.ActiveStandbyElector:
Writing znode /yarn-leader-election/synccluster/ActiveBreadCrumb to
indicate that the local node is the most recent active...
2016-09-12 01:03:52,032 INFO org.apache.hadoop.conf.Configuration: found
resource yarn-site.xml at file:.../Hadoop/etc/hadoop/yarn-site.xml
2016-09-12 01:03:52,033 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=SYSTEM
OPERATION=refreshAdminAcls TARGET=AdminService RESULT=SUCCESS
2016-09-12 01:03:52,033 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager:
Transitioning to active state
2016-09-12 01:03:52,034 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.RMFatalEventType for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMFatalEventDispatcher
2016-09-12 01:03:52,034 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM:
NMTokenKeyRollingInterval: 86400000ms and NMTokenKeyActivationDelay:
900000ms
2016-09-12 01:03:52,034 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.RMContainerTokenSecretManager:
ContainerTokenKeyRollingInterval: 86400000ms and
ContainerTokenKeyActivationDelay: 900000ms
2016-09-12 01:03:52,035 INFO
org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager:
AMRMTokenKeyRollingInterval: 86400000ms and AMRMTokenKeyActivationDelay:
900000 ms
2016-09-12 01:03:52,035 INFO
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreFactory:
Using RMStateStore implementation - class
org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStoreEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$ForwardingEventHandler
2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.NodesListManagerEventType for
class org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
2016-09-12 01:03:52,035 INFO
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Using
Scheduler:
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.scheduler.event.SchedulerEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$SchedulerEventDispatcher
2016-09-12 01:03:52,035 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher
2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationAttemptEventDispatcher
2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeEventType for
class
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$NodeEventDispatcher
2016-09-12 01:03:52,036 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics
system already initialized!
2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.RMAppManagerEventType for
class org.apache.hadoop.yarn.server.resourcemanager.RMAppManager
2016-09-12 01:03:52,036 INFO org.apache.hadoop.yarn.event.AsyncDispatcher:
Registering class
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncherEventType
for class
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher
2016-09-12 01:03:52,036 WARN org.apache.hadoop.metrics2.util.MBeans: Error
creating MBean object name: Hadoop:service=ResourceManager,name=RMNMInfo
org.apache.hadoop.metrics2.MetricsException:
org.apache.hadoop.metrics2.MetricsException:
Hadoop:service=ResourceManager,name=RMNMInfo already exists!
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:122)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newMBeanName(DefaultMetricsSystem.java:102)
at org.apache.hadoop.metrics2.util.MBeans.getMBeanName(MBeans.java:92)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:55)
at
org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: org.apache.hadoop.metrics2.MetricsException:
Hadoop:service=ResourceManager,name=RMNMInfo already exists!
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newObjectName(DefaultMetricsSystem.java:118)
... 20 more
2016-09-12 01:03:52,037 WARN org.apache.hadoop.metrics2.util.MBeans: Failed
to register MBean "null"
javax.management.RuntimeOperationsException: Exception occurred trying to
register the MBean
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:951)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
at
com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
at org.apache.hadoop.metrics2.util.MBeans.register(MBeans.java:57)
at
org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo.<init>(RMNMInfo.java:59)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:549)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.lang.IllegalArgumentException: No object name specified
at
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:949)
... 21 more
2016-09-12 01:03:52,037 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMNMInfo: Registered RMNMInfo
MBean
2016-09-12 01:03:52,037 INFO org.apache.hadoop.util.HostsFileReader:
Refreshing hosts (include/exclude) list
2016-09-12 01:03:52,038 INFO org.apache.hadoop.service.AbstractService:
Service org.apache.hadoop.yarn.server.resourcemanager.NodesListManager
failed in state INITED; cause: org.apache.hadoop.metrics2.MetricsException:
Metrics source ClusterMetrics already exists!
org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics
already exists!
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at
org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:74)
at
org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.getMetrics(ClusterMetrics.java:61)
at
org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.setDecomissionedNMsMetrics(NodesListManager.java:140)
at
org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.serviceInit(NodesListManager.java:81)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:551)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-12 01:03:52,039 INFO org.apache.hadoop.service.AbstractService:
Service RMActiveServices failed in state INITED; cause:
org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics
already exists!
org.apache.hadoop.metrics2.MetricsException: Metrics source ClusterMetrics
already exists!
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.newSourceName(DefaultMetricsSystem.java:135)
at
org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.sourceName(DefaultMetricsSystem.java:112)
at
org.apache.hadoop.metrics2.impl.MetricsSystemImpl.register(MetricsSystemImpl.java:229)
at
org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.registerMetrics(ClusterMetrics.java:74)
at
org.apache.hadoop.yarn.server.resourcemanager.ClusterMetrics.getMetrics(ClusterMetrics.java:61)
at
org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.setDecomissionedNMsMetrics(NodesListManager.java:140)
at
org.apache.hadoop.yarn.server.resourcemanager.NodesListManager.serviceInit(NodesListManager.java:81)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.service.CompositeService.serviceInit(CompositeService.java:107)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceInit(ResourceManager.java:551)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
2016-09-12 01:03:52,039 INFO org.apache.hadoop.service.AbstractService:
Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager failed in
state STOPPED; cause: java.lang.NullPointerException
java.lang.NullPointerException
at
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.stopDispatcher(CommonNodeLabelsManager.java:251)
at
org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager.serviceStop(CommonNodeLabelsManager.java:257)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at
org.apache.hadoop.service.CompositeService.stop(CompositeService.java:157)
at
org.apache.hadoop.service.CompositeService.serviceStop(CompositeService.java:131)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStop(ResourceManager.java:585)
at org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
at
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.createAndInitActiveServices(ResourceManager.java:954)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.reinitialize(ResourceManager.java:984)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1008)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1001)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1656)
at
org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1001)
at
org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:303)
at
org.apache.hadoop.yarn.server.resourcemanager.EmbeddedElectorService.becomeActive(EmbeddedElectorService.java:126)
at
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:813)
at
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:418)
at
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)