You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2022/02/16 13:49:00 UTC
[jira] [Updated] (YARN-10869) CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues

     [ https://issues.apache.org/jira/browse/YARN-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Szilard Nemeth updated YARN-10869:
----------------------------------
    Description: 
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml, CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:

{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}

The reason for this is the following:

In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:

{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
      CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
    CapacitySchedulerConfiguration leafQueueConfigTemplate = new
        CapacitySchedulerConfiguration(new Configuration(false), false);
    for (final Iterator<Map.Entry<String, String>> iterator =
         templateConfig.iterator(); iterator.hasNext(); ) {
      Map.Entry<String, String> confKeyValuePair = iterator.next();
      final String name = confKeyValuePair.getKey().replaceFirst(
          CapacitySchedulerConfiguration
              .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
          leafQueueName);
      leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
    }
    return leafQueueConfigTemplate;
  }
}
{code}

This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:

{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
    String myQueuePath = getQueuePath();
    Resource clusterMax = ResourceUtils
        .fetchMaximumAllocationFromConfig(csConf);
    Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);

    maximumAllocation = Resources.clone(
        parent == null ? clusterMax : parent.getMaximumAllocation());

    String errMsg =
        "Queue maximum allocation cannot be larger than the cluster setting"
            + " for queue " + myQueuePath
            + " max allocation per queue: %s"
            + " cluster setting: " + clusterMax;

    if (queueMax == Resources.none()) {
      // Handle backward compatibility
      long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
      int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
      if (queueMemory != UNDEFINED) {
        maximumAllocation.setMemorySize(queueMemory);
      }

      if (queueVcores != UNDEFINED) {
        maximumAllocation.setVirtualCores(queueVcores);
      }

      if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
          || (queueVcores != UNDEFINED
          && queueVcores > clusterMax.getVirtualCores()))) {
        throw new IllegalArgumentException(
            String.format(errMsg, maximumAllocation));
      }
    } else {
      // Queue level maximum-allocation can't be larger than cluster setting
      for (ResourceInformation ri : queueMax.getResources()) {
        if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
          throw new IllegalArgumentException(String.format(errMsg, queueMax));
        }

        maximumAllocation.setResourceInformation(ri.getName(), ri);
      }
    }
  }
{code}

Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.

There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.

YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.

  was:
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:

{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}

The reason for this is the following:

In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:

{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
      CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
    CapacitySchedulerConfiguration leafQueueConfigTemplate = new
        CapacitySchedulerConfiguration(new Configuration(false), false);
    for (final Iterator<Map.Entry<String, String>> iterator =
         templateConfig.iterator(); iterator.hasNext(); ) {
      Map.Entry<String, String> confKeyValuePair = iterator.next();
      final String name = confKeyValuePair.getKey().replaceFirst(
          CapacitySchedulerConfiguration
              .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
          leafQueueName);
      leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
    }
    return leafQueueConfigTemplate;
  }
}
{code}

This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:

{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
    String myQueuePath = getQueuePath();
    Resource clusterMax = ResourceUtils
        .fetchMaximumAllocationFromConfig(csConf);
    Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);

    maximumAllocation = Resources.clone(
        parent == null ? clusterMax : parent.getMaximumAllocation());

    String errMsg =
        "Queue maximum allocation cannot be larger than the cluster setting"
            + " for queue " + myQueuePath
            + " max allocation per queue: %s"
            + " cluster setting: " + clusterMax;

    if (queueMax == Resources.none()) {
      // Handle backward compatibility
      long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
      int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
      if (queueMemory != UNDEFINED) {
        maximumAllocation.setMemorySize(queueMemory);
      }

      if (queueVcores != UNDEFINED) {
        maximumAllocation.setVirtualCores(queueVcores);
      }

      if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
          || (queueVcores != UNDEFINED
          && queueVcores > clusterMax.getVirtualCores()))) {
        throw new IllegalArgumentException(
            String.format(errMsg, maximumAllocation));
      }
    } else {
      // Queue level maximum-allocation can't be larger than cluster setting
      for (ResourceInformation ri : queueMax.getResources()) {
        if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
          throw new IllegalArgumentException(String.format(errMsg, queueMax));
        }

        maximumAllocation.setResourceInformation(ri.getName(), ri);
      }
    }
  }
{code}

Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.

There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.

YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.


> CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-10869
>                 URL: https://issues.apache.org/jira/browse/YARN-10869
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 3.3.1
>            Reporter: Benjamin Teke
>            Assignee: Benjamin Teke
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 3.4.0, 3.3.2
>
>          Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml, CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
> resourcemanager      | 	at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
> {code}
> The reason for this is the following:
> In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
> {code:java}
> public CapacitySchedulerConfiguration getLeafQueueConfigs(
>       CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
>     CapacitySchedulerConfiguration leafQueueConfigTemplate = new
>         CapacitySchedulerConfiguration(new Configuration(false), false);
>     for (final Iterator<Map.Entry<String, String>> iterator =
>          templateConfig.iterator(); iterator.hasNext(); ) {
>       Map.Entry<String, String> confKeyValuePair = iterator.next();
>       final String name = confKeyValuePair.getKey().replaceFirst(
>           CapacitySchedulerConfiguration
>               .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
>           leafQueueName);
>       leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
>     }
>     return leafQueueConfigTemplate;
>   }
> }
> {code}
> This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
> {code:java}
> private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
>     String myQueuePath = getQueuePath();
>     Resource clusterMax = ResourceUtils
>         .fetchMaximumAllocationFromConfig(csConf);
>     Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
>     maximumAllocation = Resources.clone(
>         parent == null ? clusterMax : parent.getMaximumAllocation());
>     String errMsg =
>         "Queue maximum allocation cannot be larger than the cluster setting"
>             + " for queue " + myQueuePath
>             + " max allocation per queue: %s"
>             + " cluster setting: " + clusterMax;
>     if (queueMax == Resources.none()) {
>       // Handle backward compatibility
>       long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
>       int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
>       if (queueMemory != UNDEFINED) {
>         maximumAllocation.setMemorySize(queueMemory);
>       }
>       if (queueVcores != UNDEFINED) {
>         maximumAllocation.setVirtualCores(queueVcores);
>       }
>       if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
>           || (queueVcores != UNDEFINED
>           && queueVcores > clusterMax.getVirtualCores()))) {
>         throw new IllegalArgumentException(
>             String.format(errMsg, maximumAllocation));
>       }
>     } else {
>       // Queue level maximum-allocation can't be larger than cluster setting
>       for (ResourceInformation ri : queueMax.getResources()) {
>         if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
>           throw new IllegalArgumentException(String.format(errMsg, queueMax));
>         }
>         maximumAllocation.setResourceInformation(ri.getName(), ri);
>       }
>     }
>   }
> {code}
> Let's consider the following scenarios:
> # No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
> # One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
> There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.
> YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org