You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "jackwangcs (Jira)" <ji...@apache.org> on 2021/09/08 14:39:00 UTC
[jira] [Assigned] (YARN-10903) Too many "Failed to accept allocation proposal" because of wrong Headroom check for DRF

     [ https://issues.apache.org/jira/browse/YARN-10903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

jackwangcs reassigned YARN-10903:
---------------------------------

    Assignee: jackwangcs

> Too many "Failed to accept allocation proposal" because of wrong Headroom check for DRF
> ---------------------------------------------------------------------------------------
>
>                 Key: YARN-10903
>                 URL: https://issues.apache.org/jira/browse/YARN-10903
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: jackwangcs
>            Assignee: jackwangcs
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> The headroom check in  `ParentQueue.canAssign` and `RegularContainerAllocator#checkHeadroom` does not consider the DRF cases.
> This will cause a lot of "Failed to accept allocation proposal" when a queue is near-fully used. 
> In the log:
> Headroom: memory:256, vCores:729
> Request: memory:56320, vCores:5
> clusterResource: memory:673966080, vCores:110494
> If use the DRF, then 
> {code:java}
> Resources.greaterThanOrEqual(rc, clusterResource, Resources.add(
>     currentResourceLimits.getHeadroom(), resourceCouldBeUnReserved),
>     required); {code}
> will be true but in fact we can not allocate resources to the request due to the max limit(no enough memory).
> {code:java}
> 2021-07-21 23:49:39,012 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerApplicationAttempt: showRequests: application=application_1626747977559_95859 headRoom=<memory:256, vCores:729> currentConsumption=0
> 2021-07-21 23:49:39,012 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.placement.LocalityAppPlacementAllocator:  Request={AllocationRequestId: -1, Priority: 1, Capability: <memory:56320, vCores:5>, # Containers: 19, Location: *, Relax Locality: true, Execution Type Request: null, Node Label Expression: prod-best-effort-node}
> .....
> 2021-07-21 23:49:39,013 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Try to commit allocation proposal=New org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.ResourceCommitRequest:
>          ALLOCATED=[(Application=appattempt_1626747977559_95859_000001; Node=xxxx:8041; Resource=<memory:56320, vCores:5>)]
> 2021-07-21 23:49:39,013 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.UsersManager: userLimit is fetched. userLimit=<memory:7077376, vCores:1277>, userSpecificUserLimit=<memory:7077376, vCores:1277>, schedulingMode=RESPECT_PARTITION_EXCLUSIVITY, partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue: Headroom calculation for user xxxxx:  userLimit=<memory:7077376, vCores:1277> queueMaxAvailRes=<memory:0, vCores:0> consumed=<memory:0, vCores:0> partition=prod-best-effort-node
> 2021-07-21 23:49:39,013 DEBUG org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue: Used resource=<memory:7077120, vCores:548> exceeded maxResourceLimit of the queue =<memory:7089920, vCores:1278>
> 2021-07-21 23:49:39,013 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Failed to accept allocation proposal
>  {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org