You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2017/05/25 15:51:04 UTC

[jira] [Commented] (YARN-6647) ZKRMStateStore can crash during shutdown due to InterruptedException

    [ https://issues.apache.org/jira/browse/YARN-6647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16024901#comment-16024901 ] 

Jason Lowe commented on YARN-6647:
----------------------------------

Sample test output showing the mishandling of InterruptedException and a forced exit of the RM as a result.  In this case it causes tests to error because the JVM exits without notifying the test framework.
{noformat}
2017-05-25 10:23:45,835 INFO  [Thread-50] zookeeper.JUnit4ZKTestRunner (JUnit4ZKTestRunner.java:evaluate(78)) - FINISHED TEST METHOD testKillAppWhenFailoverHappensAtNewState
2017-05-25 10:23:45,835 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED
2017-05-25 10:23:45,835 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3
2017-05-25 10:23:45,835 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #2: Service Dispatcher in state Dispatcher: STARTED
2017-05-25 10:23:45,835 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED
2017-05-25 10:23:45,835 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@233aac83] util.JvmPauseMonitor (JvmPauseMonitor.java:run(188)) - Starting JVM pause monitor
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #1: Service org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter in state org.apache.hadoop.yarn.server.res
ourcemanager.ahs.RMApplicationHistoryWriter: STARTED
2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter entered state STOPPED
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: stopping services, size=0
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #0: Service org.apache.hadoop.yarn.server.resourcemanager.AdminService in state org.apache.hadoop.yarn.server.resourcemanager.Admin
Service: STARTED
2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.AdminService entered state STOPPED
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.AdminService: stopping services, size=0
2017-05-25 10:23:45,836 INFO  [main] resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1191)) - Already in standby state
2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: ResourceManager entered state STOPPED
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - ResourceManager: stopping services, size=3
2017-05-25 10:23:45,836 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #2: Service org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter in state org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: STARTED
2017-05-25 10:23:45,836 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.ahs.RMApplicationHistoryWriter: stopping services, size=0
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #1: Service org.apache.hadoop.yarn.server.resourcemanager.AdminService in state org.apache.hadoop.yarn.server.resourcemanager.AdminService: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.AdminService entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - org.apache.hadoop.yarn.server.resourcemanager.AdminService: stopping services, size=0
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #0: Service Dispatcher in state Dispatcher: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: Dispatcher entered state STOPPED
2017-05-25 10:23:45,837 INFO  [main] resourcemanager.ResourceManager (ResourceManager.java:transitionToStandby(1195)) - Transitioning to standby state
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: RMActiveServices entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:serviceStop(129)) - RMActiveServices: stopping services, size=14
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #13: Service org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher in state org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #12: Service org.apache.hadoop.yarn.server.resourcemanager.ClientRMService in state org.apache.hadoop.yarn.server.resourcemanager.ClientRMService: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ClientRMService entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #11: Service org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService in state org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #10: Service org.apache.hadoop.util.JvmPauseMonitor in state org.apache.hadoop.util.JvmPauseMonitor: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.util.JvmPauseMonitor entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #9: Service org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService in state org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: STARTED
2017-05-25 10:23:45,837 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService entered state STOPPED
2017-05-25 10:23:45,837 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #8: Service NMLivelinessMonitor in state NMLivelinessMonitor: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: NMLivelinessMonitor entered state STOPPED
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #7: Service org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler in state org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler entered state STOPPED
2017-05-25 10:23:45,838 INFO  [Ping Checker] util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(156)) - NMLivelinessMonitor thread interrupted
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #6: Service org.apache.hadoop.yarn.server.resourcemanager.NodesListManager in state org.apache.hadoop.yarn.server.resourcemanager.NodesListManager: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.NodesListManager entered state STOPPED
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #5: Service org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager in state org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.nodelabels.CommonNodeLabelsManager entered state STOPPED
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #4: Service org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor in state org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor entered state STOPPED
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #3: Service AMLivelinessMonitor in state AMLivelinessMonitor: STARTED
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: AMLivelinessMonitor entered state STOPPED
2017-05-25 10:23:45,838 INFO  [Ping Checker] util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(156)) - org.apache.hadoop.yarn.server.resourcemanager.rmapp.monitor.RMAppLifetimeMonitor thread interrupted
2017-05-25 10:23:45,838 INFO  [Ping Checker] util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(156)) - AMLivelinessMonitor thread interrupted
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #2: Service AMLivelinessMonitor in state AMLivelinessMonitor: STARTED
2017-05-25 10:23:45,838 DEBUG [Thread-50-SendThread(127.0.0.1:24578)] zookeeper.ClientCnxn (ClientCnxn.java:readResponse(843)) - Reading reply sessionid:0x15c4034de420001, packet:: clientPath:null serverPath:null finished:false header:: 39,3  replyHeader:: 39,28,-101  request:: '/rmstore/ZKRMStateRoot/RMDTSecretManagerRoot/RMDTMasterKeysRoot/DelegationKey_4,F  response::  
2017-05-25 10:23:45,838 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: AMLivelinessMonitor entered state STOPPED
2017-05-25 10:23:45,838 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #1: Service org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer in state org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer: STARTED
2017-05-25 10:23:45,839 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer entered state STOPPED
2017-05-25 10:23:45,839 DEBUG [main] service.CompositeService (CompositeService.java:stop(151)) - Stopping service #0: Service org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService in state org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService: STARTED
2017-05-25 10:23:45,839 DEBUG [main] service.AbstractService (AbstractService.java:enterState(452)) - Service: org.apache.hadoop.yarn.server.resourcemanager.RMSecretManagerService entered state STOPPED
2017-05-25 10:23:45,839 INFO  [Ping Checker] util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(156)) - AMLivelinessMonitor thread interrupted
2017-05-25 10:23:45,839 DEBUG [main] delegation.AbstractDelegationTokenSecretManager (AbstractDelegationTokenSecretManager.java:stopThreads(638)) - Stopping expired delegation token remover thread
2017-05-25 10:23:45,839 ERROR [Thread[Thread-85,5,main]] recovery.RMStateStore (RMStateStore.java:transition(456)) - Error While Storing RMDTMasterKey.
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:990)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
        at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
        at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1305)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeCreate(ZKRMStateStore.java:1261)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeRMDTMasterKeyState(ZKRMStateStore.java:1021)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:454)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:438)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1099)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeRMDTMasterKey(RMStateStore.java:931)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewMasterKey(RMDelegationTokenSecretManager.java:88)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.storeDelegationKey(AbstractDelegationTokenSecretManager.java:261)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:355)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.rollMasterKey(AbstractDelegationTokenSecretManager.java:375)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:676)
        at java.lang.Thread.run(Thread.java:745)
2017-05-25 10:23:45,839 INFO  [Ping Checker] util.AbstractLivelinessMonitor (AbstractLivelinessMonitor.java:run(156)) - org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
2017-05-25 10:23:45,839 ERROR [Thread[Thread-85,5,main]] recovery.RMStateStore (RMStateStore.java:notifyStoreOperationFailedInternal(1131)) - State store operation failed 
java.lang.InterruptedException
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:502)
        at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1406)
        at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:990)
        at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:910)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.doOperation(CuratorTransactionImpl.java:159)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.access$200(CuratorTransactionImpl.java:44)
        at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:129)
        at org.apache.curator.framework.imps.CuratorTransactionImpl$2.call(CuratorTransactionImpl.java:125)
        at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
        at org.apache.curator.framework.imps.CuratorTransactionImpl.commit(CuratorTransactionImpl.java:122)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$SafeTransaction.commit(ZKRMStateStore.java:1305)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.safeCreate(ZKRMStateStore.java:1261)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore.storeRMDTMasterKeyState(ZKRMStateStore.java:1021)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:454)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:438)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1099)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeRMDTMasterKey(RMStateStore.java:931)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewMasterKey(RMDelegationTokenSecretManager.java:88)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.storeDelegationKey(AbstractDelegationTokenSecretManager.java:261)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:355)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.rollMasterKey(AbstractDelegationTokenSecretManager.java:375)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:676)
        at java.lang.Thread.run(Thread.java:745)
2017-05-25 10:23:45,839 DEBUG [SyncThread:0] server.FinalRequestProcessor (FinalRequestProcessor.java:processRequest(88)) - Processing request:: sessionid:0x15c4034de420001 type:multi cxid:0x28 zxid:0x1d txntype:14 reqpath:n/a
2017-05-25 10:23:45,840 DEBUG [SyncThread:0] server.FinalRequestProcessor (FinalRequestProcessor.java:processRequest(160)) - sessionid:0x15c4034de420001 type:multi cxid:0x28 zxid:0x1d txntype:14 reqpath:n/a
2017-05-25 10:23:45,840 DEBUG [Thread-50-SendThread(127.0.0.1:24578)] zookeeper.ClientCnxn (ClientCnxn.java:readResponse(843)) - Reading reply sessionid:0x15c4034de420001, packet:: clientPath:null serverPath:null finished:false header:: 40,14  replyHeader:: 40,29,0  request:: org.apache.zookeeper.MultiTransactionRecord@f92aa7c8 response:: org.apache.zookeeper.MultiResponse@fda6e9e
2017-05-25 10:23:45,840 ERROR [Thread[Thread-85,5,main]] security.RMDelegationTokenSecretManager (RMDelegationTokenSecretManager.java:storeNewMasterKey(90)) - Error in storing master key with KeyID: 4
2017-05-25 10:23:45,841 DEBUG [Thread[Thread-85,5,main]] util.ExitUtil (ExitUtil.java:terminate(209)) - Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
        at org.apache.hadoop.util.ExitUtil.terminate(ExitUtil.java:265)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewMasterKey(RMDelegationTokenSecretManager.java:91)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.storeDelegationKey(AbstractDelegationTokenSecretManager.java:261)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.updateCurrentKey(AbstractDelegationTokenSecretManager.java:355)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.rollMasterKey(AbstractDelegationTokenSecretManager.java:375)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager$ExpiredTokenRemover.run(AbstractDelegationTokenSecretManager.java:676)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
        at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:273)
        at org.apache.hadoop.yarn.event.DrainDispatcher$2.handle(DrainDispatcher.java:91)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.notifyStoreOperationFailedInternal(RMStateStore.java:1134)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.access$1500(RMStateStore.java:86)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:457)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore$StoreRMDTMasterKeyTransition.transition(RMStateStore.java:438)
        at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
        at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
        at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
        at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.handleStoreEvent(RMStateStore.java:1099)
        at org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore.storeRMDTMasterKey(RMStateStore.java:931)
        at org.apache.hadoop.yarn.server.resourcemanager.security.RMDelegationTokenSecretManager.storeNewMasterKey(RMDelegationTokenSecretManager.java:88)
        ... 5 more
Caused by: java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
        at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
        at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:265)
        ... 17 more
2017-05-25 10:23:45,841 INFO  [Thread[Thread-85,5,main]] util.ExitUtil (ExitUtil.java:terminate(210)) - Exiting with status 1: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
{noformat}

Looks like the master key was rolling just as we were shutting down, and the interrupt exception ended up bubbling all the way up to the dispatcher which caused the JVM exit.  The state store needs to check if it's in the process of shutting down when an interrupted exception occurs and not report that as an error.


> ZKRMStateStore can crash during shutdown due to InterruptedException
> --------------------------------------------------------------------
>
>                 Key: YARN-6647
>                 URL: https://issues.apache.org/jira/browse/YARN-6647
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>            Reporter: Jason Lowe
>
> Noticed some tests were failing due to the JVM shutting down early.  I was able to reproduce this occasionally with TestKillApplicationWithRMHA.  Stacktrace to follow.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org