You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Szilard Nemeth (Jira)" <ji...@apache.org> on 2022/02/16 13:49:00 UTC
[jira] [Updated] (YARN-10869) CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues
[ https://issues.apache.org/jira/browse/YARN-10869?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Szilard Nemeth updated YARN-10869:
----------------------------------
Description:
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml, CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}
The reason for this is the following:
In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
CapacitySchedulerConfiguration leafQueueConfigTemplate = new
CapacitySchedulerConfiguration(new Configuration(false), false);
for (final Iterator<Map.Entry<String, String>> iterator =
templateConfig.iterator(); iterator.hasNext(); ) {
Map.Entry<String, String> confKeyValuePair = iterator.next();
final String name = confKeyValuePair.getKey().replaceFirst(
CapacitySchedulerConfiguration
.AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
leafQueueName);
leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
}
return leafQueueConfigTemplate;
}
}
{code}
This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
String myQueuePath = getQueuePath();
Resource clusterMax = ResourceUtils
.fetchMaximumAllocationFromConfig(csConf);
Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
maximumAllocation = Resources.clone(
parent == null ? clusterMax : parent.getMaximumAllocation());
String errMsg =
"Queue maximum allocation cannot be larger than the cluster setting"
+ " for queue " + myQueuePath
+ " max allocation per queue: %s"
+ " cluster setting: " + clusterMax;
if (queueMax == Resources.none()) {
// Handle backward compatibility
long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
if (queueMemory != UNDEFINED) {
maximumAllocation.setMemorySize(queueMemory);
}
if (queueVcores != UNDEFINED) {
maximumAllocation.setVirtualCores(queueVcores);
}
if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
|| (queueVcores != UNDEFINED
&& queueVcores > clusterMax.getVirtualCores()))) {
throw new IllegalArgumentException(
String.format(errMsg, maximumAllocation));
}
} else {
// Queue level maximum-allocation can't be larger than cluster setting
for (ResourceInformation ri : queueMax.getResources()) {
if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
throw new IllegalArgumentException(String.format(errMsg, queueMax));
}
maximumAllocation.setResourceInformation(ri.getName(), ri);
}
}
}
{code}
Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.
YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.
was:
When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
{code:java}
java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
{code}
The reason for this is the following:
In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
{code:java}
public CapacitySchedulerConfiguration getLeafQueueConfigs(
CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
CapacitySchedulerConfiguration leafQueueConfigTemplate = new
CapacitySchedulerConfiguration(new Configuration(false), false);
for (final Iterator<Map.Entry<String, String>> iterator =
templateConfig.iterator(); iterator.hasNext(); ) {
Map.Entry<String, String> confKeyValuePair = iterator.next();
final String name = confKeyValuePair.getKey().replaceFirst(
CapacitySchedulerConfiguration
.AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
leafQueueName);
leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
}
return leafQueueConfigTemplate;
}
}
{code}
This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
{code:java}
private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
String myQueuePath = getQueuePath();
Resource clusterMax = ResourceUtils
.fetchMaximumAllocationFromConfig(csConf);
Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
maximumAllocation = Resources.clone(
parent == null ? clusterMax : parent.getMaximumAllocation());
String errMsg =
"Queue maximum allocation cannot be larger than the cluster setting"
+ " for queue " + myQueuePath
+ " max allocation per queue: %s"
+ " cluster setting: " + clusterMax;
if (queueMax == Resources.none()) {
// Handle backward compatibility
long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
if (queueMemory != UNDEFINED) {
maximumAllocation.setMemorySize(queueMemory);
}
if (queueVcores != UNDEFINED) {
maximumAllocation.setVirtualCores(queueVcores);
}
if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
|| (queueVcores != UNDEFINED
&& queueVcores > clusterMax.getVirtualCores()))) {
throw new IllegalArgumentException(
String.format(errMsg, maximumAllocation));
}
} else {
// Queue level maximum-allocation can't be larger than cluster setting
for (ResourceInformation ri : queueMax.getResources()) {
if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
throw new IllegalArgumentException(String.format(errMsg, queueMax));
}
maximumAllocation.setResourceInformation(ri.getName(), ri);
}
}
}
{code}
Let's consider the following scenarios:
# No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
# One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.
YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.
> CS considers only the default maximum-allocation-mb/vcore property as a maximum when it creates dynamic queues
> --------------------------------------------------------------------------------------------------------------
>
> Key: YARN-10869
> URL: https://issues.apache.org/jira/browse/YARN-10869
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler
> Affects Versions: 3.3.1
> Reporter: Benjamin Teke
> Assignee: Benjamin Teke
> Priority: Major
> Labels: pull-request-available
> Fix For: 3.4.0, 3.3.2
>
> Time Spent: 2h 10m
> Remaining Estimate: 0h
>
> When using auto created queues even though the default maximum allocation was overridden in yarn-site.xml, CS will throw the following exception if a dynamic queue has the maximum allocation set via templates (yarn.scheduler.capacity.root.users.leaf-queue-template.maximum-allocation-mb) above the default 8 GB memory/4 cores:
> {code:java}
> java.lang.IllegalArgumentException: Queue maximum allocation cannot be larger than the cluster setting for queue root.users.root max allocation per queue: <memory:10000, vCores:4> cluster setting: <memory:8192, vCores:4>
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupMaximumAllocation(AbstractCSQueue.java:550)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractCSQueue.setupQueueConfigs(AbstractCSQueue.java:413)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.setupQueueConfigs(LeafQueue.java:186)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:175)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue.<init>(LeafQueue.java:156)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AbstractAutoCreatedLeafQueue.<init>(AbstractAutoCreatedLeafQueue.java:54)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.AutoCreatedLeafQueue.<init>(AutoCreatedLeafQueue.java:45)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createLegacyAutoQueue(CapacitySchedulerQueueManager.java:669)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacitySchedulerQueueManager.createQueue(CapacitySchedulerQueueManager.java:541)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.getOrCreateQueueFromPlacementContext(CapacityScheduler.java:969)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.addApplication(CapacityScheduler.java:1029)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:1989)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler.handle(CapacityScheduler.java:171)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1139)
> resourcemanager | at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$RMAppRecoveredTransition.transition(RMAppImpl.java:1090)
> {code}
> The reason for this is the following:
> In ManagedParent#getLeafQueueConfigs a completely new CapacitySchedulerConfiguration gets created:
> {code:java}
> public CapacitySchedulerConfiguration getLeafQueueConfigs(
> CapacitySchedulerConfiguration templateConfig, String leafQueueName) {
> CapacitySchedulerConfiguration leafQueueConfigTemplate = new
> CapacitySchedulerConfiguration(new Configuration(false), false);
> for (final Iterator<Map.Entry<String, String>> iterator =
> templateConfig.iterator(); iterator.hasNext(); ) {
> Map.Entry<String, String> confKeyValuePair = iterator.next();
> final String name = confKeyValuePair.getKey().replaceFirst(
> CapacitySchedulerConfiguration
> .AUTO_CREATED_LEAF_QUEUE_TEMPLATE_PREFIX,
> leafQueueName);
> leafQueueConfigTemplate.set(name, confKeyValuePair.getValue());
> }
> return leafQueueConfigTemplate;
> }
> }
> {code}
> This only contains the template configs related to the auto created queue, copied from the original Configuration object (and loaded from capacity-scheduler.xml). The maximum-allocation calculation was refactored in YARN-9116:
> {code:java}
> private void setupMaximumAllocation(CapacitySchedulerConfiguration csConf) {
> String myQueuePath = getQueuePath();
> Resource clusterMax = ResourceUtils
> .fetchMaximumAllocationFromConfig(csConf);
> Resource queueMax = csConf.getQueueMaximumAllocation(myQueuePath);
> maximumAllocation = Resources.clone(
> parent == null ? clusterMax : parent.getMaximumAllocation());
> String errMsg =
> "Queue maximum allocation cannot be larger than the cluster setting"
> + " for queue " + myQueuePath
> + " max allocation per queue: %s"
> + " cluster setting: " + clusterMax;
> if (queueMax == Resources.none()) {
> // Handle backward compatibility
> long queueMemory = csConf.getQueueMaximumAllocationMb(myQueuePath);
> int queueVcores = csConf.getQueueMaximumAllocationVcores(myQueuePath);
> if (queueMemory != UNDEFINED) {
> maximumAllocation.setMemorySize(queueMemory);
> }
> if (queueVcores != UNDEFINED) {
> maximumAllocation.setVirtualCores(queueVcores);
> }
> if ((queueMemory != UNDEFINED && queueMemory > clusterMax.getMemorySize()
> || (queueVcores != UNDEFINED
> && queueVcores > clusterMax.getVirtualCores()))) {
> throw new IllegalArgumentException(
> String.format(errMsg, maximumAllocation));
> }
> } else {
> // Queue level maximum-allocation can't be larger than cluster setting
> for (ResourceInformation ri : queueMax.getResources()) {
> if (ri.compareTo(clusterMax.getResourceInformation(ri.getName())) > 0) {
> throw new IllegalArgumentException(String.format(errMsg, queueMax));
> }
> maximumAllocation.setResourceInformation(ri.getName(), ri);
> }
> }
> }
> {code}
> Let's consider the following scenarios:
> # No maximum-allocation is set through templates, neither through the old maximum-allocation-mb/vcore property: _queueMax_ will get the value Resources.none(), so its if condition evaluates to true but both _queueMemory_ and _queueVcores_ will be UNDEFINED. The _maximumAllocation_ will simply be inherited from the parent and no _clusterMax_ comparison will be done (the second if will be skipped).
> # One of the maximum-allocation-mb/vcore properties is set: a comparison will be executed to check whether the value is indeed lower than the cluster-wide maximum. Here comes the getLeafQueueConfigs' CapacitySchedulerConfiguration duplication into the picture. Since the cluster-wide maximum is a property that comes from the YarnConfiguration object and the copied config object gets a newly created Configuration object it'll only contain the default properties.
> There are multiple solutions to this problem: either the cluster-wide maximum allocation should be migrated to the cloned Configuration object or when checking the maximum allocation the original Configuration object should be used.
> YARN-9569 solved this issue partially, but the old yarn.scheduler.maximum-allocation-mb/vcore is not migrated.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org