You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jonathan Hung (JIRA)" <ji...@apache.org> on 2017/10/10 17:40:00 UTC

[jira] [Updated] (YARN-7252) Removing queue then failing over results in exception

     [ https://issues.apache.org/jira/browse/YARN-7252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Hung updated YARN-7252:
--------------------------------
    Fix Version/s:     (was: YARN-5734)
                   3.1.0
                   3.0.0
                   2.9.0

> Removing queue then failing over results in exception
> -----------------------------------------------------
>
>                 Key: YARN-7252
>                 URL: https://issues.apache.org/jira/browse/YARN-7252
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Jonathan Hung
>            Assignee: Jonathan Hung
>            Priority: Critical
>             Fix For: 2.9.0, 3.0.0, 3.1.0
>
>         Attachments: YARN-7252-YARN-5734.001.patch, YARN-7252-YARN-5734.002.patch
>
>
> Scenario: rm1 and rm2, starting configuration with root.default, root.a. rm1 is active. First, put root.a into STOPPED state, then remove it. Then put rm1 in standby and rm2 in active. Here's the exception: {noformat}Operation failed: Error on refreshAll during transition to Active
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:315)
> 	at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
> 	at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: RefreshAll operation failed
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:747)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:307)
> 	... 10 more
> Caused by: java.io.IOException: Failed to re-init queues : root.a is deleted from the new capacity scheduler configuration, but the queue is not yet in stopped state. Current State : RUNNING
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:436)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshQueues(AdminService.java:405)
> 	at org.apache.hadoop.yarn.server.resourcemanager.AdminService.refreshAll(AdminService.java:736)
> 	... 11 more
> Caused by: java.io.IOException: root.a is deleted from the new capacity scheduler configuration, but the queue is not yet in stopped state. Current State : RUNNING
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.validateQueueHierarchy(CapacitySchedulerQueueManager.java:312)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.reinitializeQueues(CapacitySchedulerQueueManager.java:174)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitializeQueues(CapacityScheduler.java:648)
> 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.reinitialize(CapacityScheduler.java:432)
> 	... 13 more{noformat}
> Seems rm2 does not think root.a was STOPPED, so when it can't find root.a and sees it is deleted, it throws exception.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org