You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Sanjeev Tripurari <sa...@inmobi.com> on 2015/07/07 08:04:11 UTC

Re: ResourceManager fails to start

Hi Alexandru

Can you share whats value in capacity scheduler for

yarn.scheduler.capacity.am.failure.scheduling.delay.ms


Regards
-Sanjeev


On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar <
Alexandru.Pacurar@propertyshark.com> wrote:

>  Hello,
>
>
>
> I’m running Hadoop 2.6 and I have encountered a problem with the
> resourcemanager. After a restart the resourcemanager refuses to start with
> the following error:
>
>
>
> 2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:recover(796)) - Recovering attempt:
> appattempt_1435159945366_0792_000001 with final state: null
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app
> attempt : appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager
> (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,348 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed
> in state STARTED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics
> system...
>
> 2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter
> (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread
> interrupted.
>
> 2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
>
> 2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system
> shutdown complete.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
>
> 2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
>
> 2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,439 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in
> state STOPPED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>
>
> After some searching I’ve discovered that the *yarn.resourcemanager.store.class
> *property controls the state of the ResourceManager and my value is
> *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore*
> so I have the state in zookeeper.
>
> My question is, should I just remove *appattempt_1435159945366_0792_000001
> *(and any other attempts) from zookeeper in order to have my
> resourcemanager up, or is there a way to make it skip specific attempts, or
> maybe I could just recreate the state store form zero since I don’t kare
> about the running application, and I waold just like to have the
> ResourceManager service up.
>
>
>
> Thank you,
>
> Alex
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.