You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Bibin A Chundatt (JIRA)" <ji...@apache.org> on 2018/07/23 06:25:00 UTC

[jira] [Comment Edited] (YARN-8541) RM startup failure on recovery after user deletion

    [ https://issues.apache.org/jira/browse/YARN-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552384#comment-16552384 ] 

Bibin A Chundatt edited comment on YARN-8541 at 7/23/18 6:24 AM:
-----------------------------------------------------------------

Incase when queue is already provided, the app will get submitted to the queue specified .. Exception will not be thrown. 

But that is expected and old behaviour too rt ??


was (Author: bibinchundatt):
Incase when queue is already provides, the app will get submitted to the queue specified .. Exception will not be thrown. 

But that is expected and old behaviour too rt ??

> RM startup failure on recovery after user deletion
> --------------------------------------------------
>
>                 Key: YARN-8541
>                 URL: https://issues.apache.org/jira/browse/YARN-8541
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 3.1.0
>            Reporter: yimeng
>            Assignee: Bibin A Chundatt
>            Priority: Blocker
>         Attachments: YARN-8541.001.patch, YARN-8541.002.patch, YARN-8541.003.patch
>
>
> My hadoop version 3.1.0. I found that  a problem RM startup failure on recovery as the follow test step:
> 1.create a user "user1" have the permisson to submit app.
> 2.use user1 to submit a job ,wait job finished.
> 3.delete user "user1"
> 4.restart yarn 
> 5.the RM restart failed
> RM logs:
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized root queue root: numChildQueue= 3, capacity=1.0, absoluteCapacity=1.0, usedResources=<memory:0, vCores:0>usedCapacity=0.0, numApps=0, numContainers=0 | CapacitySchedulerQueueManager.java:163
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized queue mappings, override: false | UserGroupMappingPlacementRule.java:232
> 2018-07-16 16:24:59,708 | INFO | main-EventThread | Initialized CapacityScheduler with calculator=class org.apache.hadoop.yarn.util.resource.DominantResourceCalculator, minimumAllocation=<<memory:512, vCores:1>>, maximumAllocation=<<memory:65536, vCores:32>>, asynchronousScheduling=false, asyncScheduleInterval=5ms | CapacityScheduler.java:392
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | dynamic-resources.xml not found | Configuration.java:2767
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Initializing AMS Processing chain. Root Processor=[org.apache.hadoop.yarn.server.resourcemanager.DefaultAMSProcessor]. | AMSProcessingChain.java:62
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | disabled placement handler will be used, all scheduling requests will be rejected. | ApplicationMasterService.java:130
> 2018-07-16 16:24:59,709 | INFO | main-EventThread | Adding [org.apache.hadoop.yarn.server.resourcemanager.scheduler.constraint.processor.DisabledPlacementProcessor] tp top of AMS Processing chain. | AMSProcessingChain.java:75
> 2018-07-16 16:24:59,713 | WARN | main-EventThread | Exception handling the winning of election | ActiveStandbyElector.java:897
> org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
>  at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
>  at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:893)
>  at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:473)
>  at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:728)
>  at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:600)
> Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
>  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
>  at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
>  ... 4 more
> Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application application_1531624956005_0001 submitted by user super reason: No groups found for user super
>  at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1204)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1245)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1241)
>  at java.security.AccessController.doPrivileged(Native Method)
>  at javax.security.auth.Subject.doAs(Subject.java:422)
>  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1686)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1241)
>  at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
>  ... 5 more
> Caused by: org.apache.hadoop.yarn.exceptions.YarnException: Failed to submit application application_1531624956005_0001 submitted by user super reason: No groups found for user super
>  at org.apache.hadoop.yarn.server.resourcemanager.placement.UserGroupMappingPlacementRule.getPlacementForApp(UserGroupMappingPlacementRule.java:206)
>  at org.apache.hadoop.yarn.server.resourcemanager.placement.PlacementManager.placeApplication(PlacementManager.java:68)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.placeApplication(RMAppManager.java:798)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:369)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
>  at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1455)
>  at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:828)
>  at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
>  ... 13 more
> 2018-07-16 16:24:59,713 | INFO | main-EventThread | Trying to re-establish ZK session | ActiveStandbyElector.java:746
> 2018-07-16 16:24:59,715 | INFO | main-EventThread | Session: 0x1100001cdf8c2ea7 closed | ZooKeeper.java:1325
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | Initiating client connection, connectString=187-4-64-187:24002,187-4-64-119:24002,187-4-64-248:24002 sessionTimeout=45000 watcher=org.apache.hadoop.ha.ActiveStandbyElector$WatcherWithClientRef@62f6291c | ZooKeeper.java:861
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.request.timeout configured value is 120000. | ClientCnxn.java:141
> 2018-07-16 16:25:00,716 | INFO | main-EventThread | zookeeper.client.bind.port.range is not configured. | ClientCnxn.java:177



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org