You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-dev@hadoop.apache.org by "Steven Rand (JIRA)" <ji...@apache.org> on 2018/02/09 04:43:00 UTC

[jira] [Created] (YARN-7911) Method identifyContainersToPreempt uses ResourceRequest#getRelaxLocality incorrectly

Steven Rand created YARN-7911:
---------------------------------

             Summary: Method identifyContainersToPreempt uses ResourceRequest#getRelaxLocality incorrectly
                 Key: YARN-7911
                 URL: https://issues.apache.org/jira/browse/YARN-7911
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler, resourcemanager
    Affects Versions: 3.1.0
            Reporter: Steven Rand
            Assignee: Steven Rand


After YARN-7655, in {{identifyContainersToPreempt}} we expand the search space to all nodes if we had previously only considered a subset to satisfy a {{NODE_LOCAL}} or {{RACK_LOCAL}} RR, and were going to preempt AM containers as a result, and the RR allowed locality to be relaxed:

{code}
        // Don't preempt AM containers just to satisfy local requests if relax
        // locality is enabled.
        if (bestContainers != null
                && bestContainers.numAMContainers > 0
                && !ResourceRequest.isAnyLocation(rr.getResourceName())
                && rr.getRelaxLocality()) {
          bestContainers = identifyContainersToPreemptForOneContainer(
                  scheduler.getNodeTracker().getAllNodes(), rr);
        }
{code}

This turns out to be based on a misunderstanding of what {{rr.getRelaxLocality}} means. I had believed that it means that locality can be relaxed _from_ that level. However, it actually means that locality can be relaxed _to_ that level: https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-api/src/main/java/org/apache/hadoop/yarn/api/records/ResourceRequest.java#L450.

For example, suppose we have {{relaxLocality}} set to {{true}} at the node level, but {{false}} at the rack and {{ANY}} levels. This is saying that we cannot relax locality to the rack level. However, the current behavior after YARN-7655 is to interpret relaxLocality being true at the node level as saying that it's okay to satisfy the request elsewhere.

What we should do instead is check whether relaxLocality is enabled for the corresponding RR at the next level. So if we're considering a node-level RR, we should find the corresponding rack-level RR and check whether relaxLocality is enabled for it. And similarly, if we're considering a rack-level RR, we should check the corresponding any-level RR.

It may also be better to use {{FSAppAttempt#getAllowedLocalityLevel}} instead of explicitly checking {{relaxLocality}}, but I'm not sure which is correct.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-dev-help@hadoop.apache.org