You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Adam Antal (Jira)" <ji...@apache.org> on 2020/04/24 11:15:00 UTC
[jira] [Created] (YARN-10243) Rack-only localization constraint for
MR AM is broken for CapacityScheduler
Adam Antal created YARN-10243:
---------------------------------
Summary: Rack-only localization constraint for MR AM is broken for CapacityScheduler
Key: YARN-10243
URL: https://issues.apache.org/jira/browse/YARN-10243
Project: Hadoop YARN
Issue Type: Bug
Components: capacity scheduler, capacityscheduler
Affects Versions: 3.2.0
Reporter: Adam Antal
Reproduction: Start a MR sleep job with strict-locality configured for AM ({{-Dmapreduce.job.am.strict-locality=/rack1}} for instance). If CapacityScheduler is used, the job will hang (stuck in SCHEDULED state).
Root cause: if there are no other resources requested (like node locality or other constraint), the scheduling opportunities counter will not be incremented and the following piece of code always returns false (so we always skip this constraint) resulting in an infinite loop:
{code:java}
// If we are here, we do need containers on this rack for RACK_LOCAL req
if (type == NodeType.RACK_LOCAL) {
// 'Delay' rack-local just a little bit...
long missedOpportunities =
application.getSchedulingOpportunities(schedulerKey);
return getActualNodeLocalityDelay() < missedOpportunities;
}
{code}
Workaround: set {{yarn.scheduler.capacity.node-locality-delay}} to zero to enforce this rule to be processed immediately.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org