You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Jim Brennan (Jira)" <ji...@apache.org> on 2020/06/02 22:30:00 UTC
[jira] [Commented] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

    [ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124440#comment-17124440 ] 

Jim Brennan commented on YARN-10283:
------------------------------------

[~pbacsko] I downloaded your patch and your test case and verified that I see the same behavior as you.
reproWithoutNodeLabels fails unless I set minimum-allocation-mb=1024, and reproTestWithNodeLabels succeeds.

I have uploaded a patch for YARN-9903 based on an internal change that we've been running with for a long time.  I verified that with the YARN-9903 patch, I get the same results for your repro test cases as with the YARN-10283 patch.

Please feel free to pull in the additional changes from YARN-9903 into this patch.  I think it addresses the comment from [~tarunparimi].   The YARN-9903 patch does not include your changes to FiCaSchedulerApp.  I'm not certain that change is needed, but it might be.

[~epayne], can you take a look as well?



> Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10283
>                 URL: https://issues.apache.org/jira/browse/YARN-10283
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, YARN-10283-ReproTest2.patch
>
>
> Recently we've been investigating a scenario where applications submitted to a lower priority queue could not get scheduled because a higher priority queue in the same hierarchy could now satisfy the allocation request. Both queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added to the partition
> * Both queues have a limit of <memory:5120, vCores:8>
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage <memory:2560, vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller than the current limit resource <memory:5120, vCores:8>
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an allocated container for <memory:512, vcores:4>
> 4. But we can't commit the resource request because we would have 9 vcores in total, violating the limit.
> The problem is that we always try to assign container for the same application in each heartbeat from "highprio". Applications in "lowprio" cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>           && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org