You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2021/07/29 15:58:00 UTC
[jira] [Commented] (YARN-10869) CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues

    [ https://issues.apache.org/jira/browse/YARN-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17389996#comment-17389996 ] 

Szilard Nemeth commented on YARN-10869:
---------------------------------------

Thanks [~bteke] for working on this. Latest patch LGTM, committed to trunk.

> CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10869
>                 URL: https://issues.apache.org/jira/browse/YARN-10869
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.3.1
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
> {code}
> The reason for this is the following:
> In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
> {code:java}
> public CapacitySchedulerConfiguration getLeafQueueConfigs(
>       CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
>     CapacitySchedulerConfiguration leafQueueConfigTemplate = new
>         CapacitySchedulerConfiguration(new Configuration(false), false);
>     for (final Iterator<Map.Entry<String, String>> iterator =
>          templateConfig.iterator(); iterator.hasNext(); ) {
>       Map.Entry<String, String> confKeyValuePair = iterator.next();
>       final String name = confKeyValuePair.getKey().replaceFirst(
>           CapacitySchedulerConfiguration
>               .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
>           leafQueueName);
>       leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
>     }
>     return leafQueueConfigTemplate;
>   }
> }
> {code}
> This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
> {code:java}
> private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
>     String myQueuePath = getQueuePath();
>     Resource clusterMax = ResourceUtils
>         .fetchMaximumAllocationFromConfig(csConf);
>     Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
>     maximumAllocation = Resources.clone(
>         parent == null ? clusterMax : parent.getMaximumAllocation());
>     String errMsg =
>         "Queue maximum allocation cannot be larger than the cluster setting"
>             + " for queue " + myQueuePath
>             + " max allocation per queue: %s"
>             + " cluster setting: " + clusterMax;
>     if (queueMax == Resources.none()) {
>       // Handle backward compatibility
>       long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
>       int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
>       if (queueMemory != UNDEFINED) {
>         maximumAllocation.setMemorySize(queueMemory);
>       }
>       if (queueVcores != UNDEFINED) {
>         maximumAllocation.setVirtualCores(queueVcores);
>       }
>       if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
>           || (queueVcores != UNDEFINED
>           && queueVcores > clusterMax.getVirtualCores()))) {
>         throw new IllegalArgumentException(
>             String.format(errMsg, maximumAllocation));
>       }
>     } else {
>       // Queue level maximum-allocation can't be larger than cluster setting
>       for (ResourceInformation ri : queueMax.getResources()) {
>         if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
>           throw new IllegalArgumentException(String.format(errMsg, queueMax));
>         }
>         maximumAllocation.setResourceInformation(ri.getName(), ri);
>       }
>     }
>   }
> {code}
> Let's consider the following scenarios:
> # No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
> # One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
> There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.
> YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org