You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Benjamin Teke (Jira)" <ji...@apache.org> on 2021/07/21 16:52:00 UTC

[jira] [Updated] (YARN-10869) CS considers only the default maximum-allocation property as a maximum when it creates dynamic queues

     [ https://issues.apache.org/jira/browse/YARN-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Teke updated YARN-10869:
---------------------------------
    Description: 
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:

{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}

The reason for this is the following:

In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:

{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
      CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
    CapacitySchedulerConfiguration leafQueueConfigTemplate = new
        CapacitySchedulerConfiguration(new Configuration(false), false);
    for (final Iterator<Map.Entry<String, String>> iterator =
         templateConfig.iterator(); iterator.hasNext(); ) {
      Map.Entry<String, String> confKeyValuePair = iterator.next();
      final String name = confKeyValuePair.getKey().replaceFirst(
          CapacitySchedulerConfiguration
              .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
          leafQueueName);
      leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
    }
    return leafQueueConfigTemplate;
  }
}
{code}

This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:

{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
    String myQueuePath = getQueuePath();
    Resource clusterMax = ResourceUtils
        .fetchMaximumAllocationFromConfig(csConf);
    Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);

    maximumAllocation = Resources.clone(
        parent == null ? clusterMax : parent.getMaximumAllocation());

    String errMsg =
        "Queue maximum allocation cannot be larger than the cluster setting"
            + " for queue " + myQueuePath
            + " max allocation per queue: %s"
            + " cluster setting: " + clusterMax;

    if (queueMax == Resources.none()) {
      // Handle backward compatibility
      long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
      int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
      if (queueMemory != UNDEFINED) {
        maximumAllocation.setMemorySize(queueMemory);
      }

      if (queueVcores != UNDEFINED) {
        maximumAllocation.setVirtualCores(queueVcores);
      }

      if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
          || (queueVcores != UNDEFINED
          && queueVcores > clusterMax.getVirtualCores()))) {
        throw new IllegalArgumentException(
            String.format(errMsg, maximumAllocation));
      }
    } else {
      // Queue level maximum-allocation can't be larger than cluster setting
      for (ResourceInformation ri : queueMax.getResources()) {
        if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
          throw new IllegalArgumentException(String.format(errMsg, queueMax));
        }

        maximumAllocation.setResourceInformation(ri.getName(), ri);
      }
    }
  }
{code}

Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the new maximum-allocation nor the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.

There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object, or ideally the duplication of the Configuration object should be eliminated, as it caused some issues in the past as well.


  was:
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:

{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}

The reason for this is the following:

In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:

{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
      CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
    CapacitySchedulerConfiguration leafQueueConfigTemplate = new
        CapacitySchedulerConfiguration(new Configuration(false), false);
    for (final Iterator<Map.Entry<String, String>> iterator =
         templateConfig.iterator(); iterator.hasNext(); ) {
      Map.Entry<String, String> confKeyValuePair = iterator.next();
      final String name = confKeyValuePair.getKey().replaceFirst(
          CapacitySchedulerConfiguration
              .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
          leafQueueName);
      leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
    }
    return leafQueueConfigTemplate;
  }
}
{code}

This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:

{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
    String myQueuePath = getQueuePath();
    Resource clusterMax = ResourceUtils
        .fetchMaximumAllocationFromConfig(csConf);
    Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);

    maximumAllocation = Resources.clone(
        parent == null ? clusterMax : parent.getMaximumAllocation());

    String errMsg =
        "Queue maximum allocation cannot be larger than the cluster setting"
            + " for queue " + myQueuePath
            + " max allocation per queue: %s"
            + " cluster setting: " + clusterMax;

    if (queueMax == Resources.none()) {
      // Handle backward compatibility
      long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
      int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
      if (queueMemory != UNDEFINED) {
        maximumAllocation.setMemorySize(queueMemory);
      }

      if (queueVcores != UNDEFINED) {
        maximumAllocation.setVirtualCores(queueVcores);
      }

      if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
          || (queueVcores != UNDEFINED
          && queueVcores > clusterMax.getVirtualCores()))) {
        throw new IllegalArgumentException(
            String.format(errMsg, maximumAllocation));
      }
    } else {
      // Queue level maximum-allocation can't be larger than cluster setting
      for (ResourceInformation ri : queueMax.getResources()) {
        if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
          throw new IllegalArgumentException(String.format(errMsg, queueMax));
        }

        maximumAllocation.setResourceInformation(ri.getName(), ri);
      }
    }
  }
{code}

Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the new maximum-allocation nor the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.

There are multiple solutions to this problem: either the clusterwide maximum allocation should be migrated to the cloned Configuration object, or ideally the duplication of the Configuration object should be eliminated, as it caused some issues in the past as well.



> CS considers only the default maximum-allocation property as a maximum when it creates dynamic queues
> -----------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10869
>                 URL: https://issues.apache.org/jira/browse/YARN-10869
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.3.1
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>
> When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
> {code}
> The reason for this is the following:
> In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
> {code:java}
> public CapacitySchedulerConfiguration getLeafQueueConfigs(
>       CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
>     CapacitySchedulerConfiguration leafQueueConfigTemplate = new
>         CapacitySchedulerConfiguration(new Configuration(false), false);
>     for (final Iterator<Map.Entry<String, String>> iterator =
>          templateConfig.iterator(); iterator.hasNext(); ) {
>       Map.Entry<String, String> confKeyValuePair = iterator.next();
>       final String name = confKeyValuePair.getKey().replaceFirst(
>           CapacitySchedulerConfiguration
>               .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
>           leafQueueName);
>       leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
>     }
>     return leafQueueConfigTemplate;
>   }
> }
> {code}
> This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
> {code:java}
> private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
>     String myQueuePath = getQueuePath();
>     Resource clusterMax = ResourceUtils
>         .fetchMaximumAllocationFromConfig(csConf);
>     Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
>     maximumAllocation = Resources.clone(
>         parent == null ? clusterMax : parent.getMaximumAllocation());
>     String errMsg =
>         "Queue maximum allocation cannot be larger than the cluster setting"
>             + " for queue " + myQueuePath
>             + " max allocation per queue: %s"
>             + " cluster setting: " + clusterMax;
>     if (queueMax == Resources.none()) {
>       // Handle backward compatibility
>       long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
>       int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
>       if (queueMemory != UNDEFINED) {
>         maximumAllocation.setMemorySize(queueMemory);
>       }
>       if (queueVcores != UNDEFINED) {
>         maximumAllocation.setVirtualCores(queueVcores);
>       }
>       if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
>           || (queueVcores != UNDEFINED
>           && queueVcores > clusterMax.getVirtualCores()))) {
>         throw new IllegalArgumentException(
>             String.format(errMsg, maximumAllocation));
>       }
>     } else {
>       // Queue level maximum-allocation can't be larger than cluster setting
>       for (ResourceInformation ri : queueMax.getResources()) {
>         if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
>           throw new IllegalArgumentException(String.format(errMsg, queueMax));
>         }
>         maximumAllocation.setResourceInformation(ri.getName(), ri);
>       }
>     }
>   }
> {code}
> Let's consider the following scenarios:
> # No maximum-allocation is set through templates, neither through the new maximum-allocation nor the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
> # One of the maximum-allocation properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
> There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object, or ideally the duplication of the Configuration object should be eliminated, as it caused some issues in the past as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org