You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Eric Payne (Jira)" <ji...@apache.org> on 2021/11/11 18:55:00 UTC
[jira] [Resolved] (YARN-10848) Vcore allocation problem with
DefaultResourceCalculator
[ https://issues.apache.org/jira/browse/YARN-10848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Eric Payne resolved YARN-10848.
-------------------------------
Resolution: Not A Problem
I am closing this JIRA based on the above discussion.
> Vcore allocation problem with DefaultResourceCalculator
> -------------------------------------------------------
>
> Key: YARN-10848
> URL: https://issues.apache.org/jira/browse/YARN-10848
> Project: Hadoop YARN
> Issue Type: Bug
> Components: capacity scheduler, capacityscheduler
> Reporter: Peter Bacsko
> Assignee: Minni Mittal
> Priority: Major
> Labels: pull-request-available
> Attachments: TestTooManyContainers.java
>
> Time Spent: 20m
> Remaining Estimate: 0h
>
> If we use DefaultResourceCalculator, then Capacity Scheduler keeps allocating containers even if we run out of vcores.
> CS checks the the available resources at two places. The first check is {{CapacityScheduler.allocateContainerOnSingleNode()}}:
> {noformat}
> if (calculator.computeAvailableContainers(Resources
> .add(node.getUnallocatedResource(), node.getTotalKillableResources()),
> minimumAllocation) <= 0) {
> LOG.debug("This node " + node.getNodeID() + " doesn't have sufficient "
> + "available or preemptible resource for minimum allocation");
> {noformat}
> The second, which is more important, is located in {{RegularContainerAllocator.assignContainer()}}:
> {noformat}
> if (!Resources.fitsIn(rc, capability, totalResource)) {
> LOG.warn("Node : " + node.getNodeID()
> + " does not have sufficient resource for ask : " + pendingAsk
> + " node total capability : " + node.getTotalResource());
> // Skip this locality request
> ActivitiesLogger.APP.recordSkippedAppActivityWithoutAllocation(
> activitiesManager, node, application, schedulerKey,
> ActivityDiagnosticConstant.
> NODE_TOTAL_RESOURCE_INSUFFICIENT_FOR_REQUEST
> + getResourceDiagnostics(capability, totalResource),
> ActivityLevel.NODE);
> return ContainerAllocation.LOCALITY_SKIPPED;
> }
> {noformat}
> Here, {{rc}} is the resource calculator instance, the other two values are:
> {noformat}
> Resource capability = pendingAsk.getPerAllocationResource();
> Resource available = node.getUnallocatedResource();
> {noformat}
> There is a repro unit test attatched to this case, which can demonstrate the problem. The root cause is that we pass the resource calculator to {{Resource.fitsIn()}}. Instead, we should use an overridden version, just like in {{FSAppAttempt.assignContainer()}}:
> {noformat}
> // Can we allocate a container on this node?
> if (Resources.fitsIn(capability, available)) {
> // Inform the application of the new container for this request
> RMContainer allocatedContainer =
> allocate(type, node, schedulerKey, pendingAsk,
> reservedContainer);
> {noformat}
> In CS, if we switch to DominantResourceCalculator OR use {{Resources.fitsIn()}} without the calculator in {{RegularContainerAllocator.assignContainer()}}, that fixes the failing unit test (see {{testTooManyContainers()}} in {{TestTooManyContainers.java}}).
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org