You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Eric Payne (Jira)" <ji...@apache.org> on 2020/06/04 19:44:00 UTC
[jira] [Comment Edited] (YARN-10283) Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used

    [ https://issues.apache.org/jira/browse/YARN-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17126100#comment-17126100 ] 

Eric Payne edited comment on YARN-10283 at 6/4/20, 7:43 PM:
------------------------------------------------------------

-[~Jim_Brennan], it looks like the patch applies cleanly all the way back to 2.10.]
Sorry, this was placed in the wrong JIRA.


was (Author: eepayne):
[~Jim_Brennan], it looks like the patch applies cleanly all the way back to 2.10.

> Capacity Scheduler: starvation occurs if a higher priority queue is full and node labels are used
> -------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10283
>                 URL: https://issues.apache.org/jira/browse/YARN-10283
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: YARN-10283-POC01.patch, YARN-10283-ReproTest.patch, YARN-10283-ReproTest2.patch
>
>
> Recently we've been investigating a scenario where applications submitted to a lower priority queue could not get scheduled because a higher priority queue in the same hierarchy could now satisfy the allocation request. Both queue belonged to the same partition.
> If we disabled node labels, the problem disappeared.
> The problem is that {{RegularContainerAllocator}} always allocated a container for the request, even if it should not have.
> *Example:*
> * Cluster total resources: 3 nodes, 15GB, 24 vcores (5GB / 8 vcore per node)
> * Partition "shared" was created with 2 nodes
> * "root.lowprio" (priority = 20) and "root.highprio" (priorty = 40) were added to the partition
> * Both queues have a limit of <memory:5120, vCores:8>
> * Using DominantResourceCalculator
> Setup:
> Submit distributed shell application to highprio with switches "-num_containers 3 -container_vcores 4". The memory allocation is 512MB per container.
> Chain of events:
> 1. Queue is filled with contaners until it reaches usage <memory:2560, vCores:5>
> 2. A node update event is pushed to CS from a node which is part of the partition
> 2. {{AbstractCSQueue.canAssignToQueue()}} returns true because it's smaller than the current limit resource <memory:5120, vCores:8>
> 3. Then {{LeafQueue.assignContainers()}} runs successfully and gets an allocated container for <memory:512, vcores:4>
> 4. But we can't commit the resource request because we would have 9 vcores in total, violating the limit.
> The problem is that we always try to assign container for the same application in each heartbeat from "highprio". Applications in "lowprio" cannot make progress.
> *Problem:*
> {{RegularContainerAllocator.assignContainer()}} does not handle this case well. We only reject allocation if this condition is satisfied:
> {noformat}
>  if (rmContainer == null && reservationsContinueLooking
>           && node.getLabels().isEmpty()) {
> {noformat}
> But if we have node labels, we enter a different code path and succeed with the allocation if there's room for a container.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org