You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Alexandru Pacurar <Al...@PropertyShark.com> on 2015/06/26 15:10:55 UTC

ResourceManager fails to start

Hello,

I'm running Hadoop 2.6 and I have encountered a problem with the resourcemanager. After a restart the resourcemanager refuses to start with the following error:

2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:recover(796)) - Recovering attempt: appattempt_1435159945366_0792_000001 with final state: null
2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app attempt : appattempt_1435159945366_0792_000001
2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
java.lang.NullPointerException
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 08:54:10,348 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed in state STARTED; cause: java.lang.NullPointerException
java.lang.NullPointerException
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
                at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
                at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
                at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
                at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
                at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
                at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
                at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
                at java.security.AccessController.doPrivileged(Native Method)
                at javax.security.auth.Subject.doAs(Subject.java:415)
                at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
                at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
                at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics system...
2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread interrupted.
2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system shutdown complete.
2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to stop, igonring any new events.
2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(512)) - EventThread shut down
2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to stop, igonring any new events.
2015-06-26 08:54:10,439 INFO  service.AbstractService (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in state STOPPED; cause: java.lang.NullPointerException
java.lang.NullPointerException

After some searching I've discovered that the yarn.resourcemanager.store.class property controls the state of the ResourceManager and my value is org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore so I have the state in zookeeper.
My question is, should I just remove appattempt_1435159945366_0792_000001 (and any other attempts) from zookeeper in order to have my resourcemanager up, or is there a way to make it skip specific attempts, or maybe I could just recreate the state store form zero since I don't kare about the running application, and I waold just like to have the ResourceManager service up.

Thank you,
Alex

Re: ResourceManager fails to start

Posted by Sanjeev Tripurari <sa...@inmobi.com>.
Hi Alexandru

Can you share whats value in capacity scheduler for

yarn.scheduler.capacity.am.failure.scheduling.delay.ms


Regards
-Sanjeev


On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar <
Alexandru.Pacurar@propertyshark.com> wrote:

>  Hello,
>
>
>
> I’m running Hadoop 2.6 and I have encountered a problem with the
> resourcemanager. After a restart the resourcemanager refuses to start with
> the following error:
>
>
>
> 2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:recover(796)) - Recovering attempt:
> appattempt_1435159945366_0792_000001 with final state: null
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app
> attempt : appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager
> (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,348 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed
> in state STARTED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics
> system...
>
> 2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter
> (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread
> interrupted.
>
> 2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
>
> 2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system
> shutdown complete.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
>
> 2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
>
> 2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,439 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in
> state STOPPED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>
>
> After some searching I’ve discovered that the *yarn.resourcemanager.store.class
> *property controls the state of the ResourceManager and my value is
> *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore*
> so I have the state in zookeeper.
>
> My question is, should I just remove *appattempt_1435159945366_0792_000001
> *(and any other attempts) from zookeeper in order to have my
> resourcemanager up, or is there a way to make it skip specific attempts, or
> maybe I could just recreate the state store form zero since I don’t kare
> about the running application, and I waold just like to have the
> ResourceManager service up.
>
>
>
> Thank you,
>
> Alex
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: ResourceManager fails to start

Posted by Sanjeev Tripurari <sa...@inmobi.com>.
Hi Alexandru

Can you share whats value in capacity scheduler for

yarn.scheduler.capacity.am.failure.scheduling.delay.ms


Regards
-Sanjeev


On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar <
Alexandru.Pacurar@propertyshark.com> wrote:

>  Hello,
>
>
>
> I’m running Hadoop 2.6 and I have encountered a problem with the
> resourcemanager. After a restart the resourcemanager refuses to start with
> the following error:
>
>
>
> 2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:recover(796)) - Recovering attempt:
> appattempt_1435159945366_0792_000001 with final state: null
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app
> attempt : appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager
> (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,348 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed
> in state STARTED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics
> system...
>
> 2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter
> (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread
> interrupted.
>
> 2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
>
> 2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system
> shutdown complete.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
>
> 2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
>
> 2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,439 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in
> state STOPPED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>
>
> After some searching I’ve discovered that the *yarn.resourcemanager.store.class
> *property controls the state of the ResourceManager and my value is
> *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore*
> so I have the state in zookeeper.
>
> My question is, should I just remove *appattempt_1435159945366_0792_000001
> *(and any other attempts) from zookeeper in order to have my
> resourcemanager up, or is there a way to make it skip specific attempts, or
> maybe I could just recreate the state store form zero since I don’t kare
> about the running application, and I waold just like to have the
> ResourceManager service up.
>
>
>
> Thank you,
>
> Alex
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: ResourceManager fails to start

Posted by Sanjeev Tripurari <sa...@inmobi.com>.
Hi Alexandru

Can you share whats value in capacity scheduler for

yarn.scheduler.capacity.am.failure.scheduling.delay.ms


Regards
-Sanjeev


On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar <
Alexandru.Pacurar@propertyshark.com> wrote:

>  Hello,
>
>
>
> I’m running Hadoop 2.6 and I have encountered a problem with the
> resourcemanager. After a restart the resourcemanager refuses to start with
> the following error:
>
>
>
> 2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:recover(796)) - Recovering attempt:
> appattempt_1435159945366_0792_000001 with final state: null
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app
> attempt : appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager
> (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,348 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed
> in state STARTED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics
> system...
>
> 2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter
> (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread
> interrupted.
>
> 2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
>
> 2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system
> shutdown complete.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
>
> 2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
>
> 2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,439 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in
> state STOPPED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>
>
> After some searching I’ve discovered that the *yarn.resourcemanager.store.class
> *property controls the state of the ResourceManager and my value is
> *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore*
> so I have the state in zookeeper.
>
> My question is, should I just remove *appattempt_1435159945366_0792_000001
> *(and any other attempts) from zookeeper in order to have my
> resourcemanager up, or is there a way to make it skip specific attempts, or
> maybe I could just recreate the state store form zero since I don’t kare
> about the running application, and I waold just like to have the
> ResourceManager service up.
>
>
>
> Thank you,
>
> Alex
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.

Re: ResourceManager fails to start

Posted by Sanjeev Tripurari <sa...@inmobi.com>.
Hi Alexandru

Can you share whats value in capacity scheduler for

yarn.scheduler.capacity.am.failure.scheduling.delay.ms


Regards
-Sanjeev


On Fri, Jun 26, 2015 at 6:40 PM, Alexandru Pacurar <
Alexandru.Pacurar@propertyshark.com> wrote:

>  Hello,
>
>
>
> I’m running Hadoop 2.6 and I have encountered a problem with the
> resourcemanager. After a restart the resourcemanager refuses to start with
> the following error:
>
>
>
> 2015-06-26 08:54:10,342 INFO  attempt.RMAppAttemptImpl
> (RMAppAttemptImpl.java:recover(796)) - Recovering attempt:
> appattempt_1435159945366_0792_000001 with final state: null
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken
> for ApplicationAttempt: appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,342 INFO  security.AMRMTokenSecretManager
> (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for
> appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,343 INFO  resourcemanager.ApplicationMasterService
> (ApplicationMasterService.java:registerAppAttempt(670)) - Registering app
> attempt : appattempt_1435159945366_0792_000001
>
> 2015-06-26 08:54:10,344 ERROR resourcemanager.ResourceManager
> (ResourceManager.java:serviceStart(594)) - Failed to load/recover state
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,348 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service RMActiveServices failed
> in state STARTED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplicationAttempt(CapacityScheduler.java:734)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1089)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:114)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1038)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl$AttemptRecoveredTransition.transition(RMAppAttemptImpl.java:1002)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:755)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl.handle(RMAppAttemptImpl.java:106)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.recoverAppAttempts(RMAppImpl.java:831)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$1900(RMAppImpl.java:101)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:846)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:836)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
>
>                 at
> org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:711)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:312)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:413)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1207)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:590)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1014)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1051)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1047)
>
>                 at java.security.AccessController.doPrivileged(Native
> Method)
>
>                 at javax.security.auth.Subject.doAs(Subject.java:415)
>
>                 at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1047)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1091)
>
>                 at
> org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
>
>                 at
> org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1226)
>
> 2015-06-26 08:54:10,350 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(210)) - Stopping ResourceManager metrics
> system...
>
> 2015-06-26 08:54:10,417 INFO  impl.MetricsSinkAdapter
> (MetricsSinkAdapter.java:publishMetricsFromQueue(135)) - timeline thread
> interrupted.
>
> 2015-06-26 08:54:10,419 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:stop(216)) - ResourceManager metrics system stopped.
>
> 2015-06-26 08:54:10,420 INFO  impl.MetricsSystemImpl
> (MetricsSystemImpl.java:shutdown(605)) - ResourceManager metrics system
> shutdown complete.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ZooKeeper
> (ZooKeeper.java:close(684)) - Session: 0x34e2fddab0e0001 closed
>
> 2015-06-26 08:54:10,437 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,437 INFO  zookeeper.ClientCnxn
> (ClientCnxn.java:run(512)) - EventThread shut down
>
> 2015-06-26 08:54:10,439 INFO  event.AsyncDispatcher
> (AsyncDispatcher.java:serviceStop(138)) - AsyncDispatcher is draining to
> stop, igonring any new events.
>
> 2015-06-26 08:54:10,439 INFO  service.AbstractService
> (AbstractService.java:noteFailure(272)) - Service Dispatcher failed in
> state STOPPED; cause: java.lang.NullPointerException
>
> java.lang.NullPointerException
>
>
>
> After some searching I’ve discovered that the *yarn.resourcemanager.store.class
> *property controls the state of the ResourceManager and my value is
> *org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore*
> so I have the state in zookeeper.
>
> My question is, should I just remove *appattempt_1435159945366_0792_000001
> *(and any other attempts) from zookeeper in order to have my
> resourcemanager up, or is there a way to make it skip specific attempts, or
> maybe I could just recreate the state store form zero since I don’t kare
> about the running application, and I waold just like to have the
> ResourceManager service up.
>
>
>
> Thank you,
>
> Alex
>

-- 
_____________________________________________________________
The information contained in this communication is intended solely for the 
use of the individual or entity to whom it is addressed and others 
authorized to receive it. It may contain confidential or legally privileged 
information. If you are not the intended recipient you are hereby notified 
that any disclosure, copying, distribution or taking any action in reliance 
on the contents of this information is strictly prohibited and may be 
unlawful. If you have received this communication in error, please notify 
us immediately by responding to this email and then delete it from your 
system. The firm is neither liable for the proper and complete transmission 
of the information contained in this communication nor for any delay in its 
receipt.