You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benjamin Teke (Jira)" <ji...@apache.org> on 2021/06/02 14:16:00 UTC

[jira] [Comment Edited] (YARN-10796) Capacity Scheduler: dynamic queue cannot scale out properly if its capacity is 0%

    [ https://issues.apache.org/jira/browse/YARN-10796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355758#comment-17355758 ] 

Benjamin Teke edited comment on YARN-10796 at 6/2/21, 2:15 PM:
---------------------------------------------------------------

[~pbacsko] thanks for the patch. One small thing: since the _originalCapacity.equals(Resources.none())_ case is/should be the same as if the userLimitFactor was disabled (set to -1) I think merging the two conditions under one if would be a bit cleaner. Or even turning around the logic like: 

{code:java}
if (getUserLimitFactor() == -1 || originalCapacity.equals(Resources.none()) {
    maxUserLimit = lQueue.getEffectiveMaxCapacityDown(nodePartition, lQueue.getMinimumAllocation());
} else {
...
}
{code}



was (Author: bteke):
[~pbacsko] thanks for the patch. One small thing: since the _originalCapacity.equals(Resources.none())_ case is/should be the same as if the userLimitFactor was disabled (set to -1) I think merging the two ifs would be a bit cleaner. Or even turning around the logic like: 

{code:java}
if (getUserLimitFactor() == -1 || originalCapacity.equals(Resources.none()) {
    maxUserLimit = lQueue.getEffectiveMaxCapacityDown(nodePartition, lQueue.getMinimumAllocation());
} else {
...
}
{code}


> Capacity Scheduler: dynamic queue cannot scale out properly if its capacity is 0%
> ---------------------------------------------------------------------------------
>
>                 Key: YARN-10796
>                 URL: https://issues.apache.org/jira/browse/YARN-10796
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler, capacityscheduler
>            Reporter: Peter Bacsko
>            Assignee: Peter Bacsko
>            Priority: Major
>         Attachments: YARN-10796-001.patch, YARN-10796-002.patch
>
>
> If we have a dynamic queue (AutoCreatedLeafQueue) with capacity = 0%, then it cannot properly scale even if it's max-capacity and the parent's max-capacity would allow it.
> Example:
> {noformat}
> Cluster Capacity:  16 GB / 16cpu (2 nodes, each with 8 GB / 8 cpu )
> Container allocation size: 1G / 1 vcore
> root.dynamic 
>     Effective Capacity:      <memory: 8192, vCores: 8> ( 50.0%)
>     Effective Max Capacity:  <memory:16384, vCores:16> (100.0%) 
>     Template:
>         Capacity:               40%
>         Max Capacity:           100%
>         User Limit Factor:      4
>  {noformat}
> leaf-queue-template.capacity = 40%
>  leaf-queue-template.maximum-capacity = 100%
>  leaf-queue-template.maximum-am-resource-percent = 50%
>  leaf-queue-template.minimum-user-limit-percent =100%
>  leaf-queue-template.user-limit-factor = 4
> "root.dynamic" has a maximum capacity of 100% and a capacity of 50%.
> Let's assume there are running containers in these dynamic queues (MR sleep jobs):
>  root.dynamic.user1 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user2 = 1 AM + 3 container (capacity = 40%)
>  root.dynamic.user3 = 1 AM + 15 container (capacity = 0%)
> This scenario will result in an underutilized cluster. There will be approx 18% unused capacity. On the other hand, it's still possible to submit a new application to root.dynamic.user1 or root.dynamic.user2 and reaching a 100% utilization is possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org