You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Brian Goerlitz (Jira)" <ji...@apache.org> on 2023/02/03 17:50:00 UTC
[jira] [Created] (YARN-11428) FairScheduler: Expected preemption may not happen if node has enough free resources

Brian Goerlitz created YARN-11428:
-------------------------------------

             Summary: FairScheduler: Expected preemption may not happen if node has enough free resources
                 Key: YARN-11428
                 URL: https://issues.apache.org/jira/browse/YARN-11428
             Project: Hadoop YARN
          Issue Type: Bug
            Reporter: Brian Goerlitz


An application can be FairShare starved in the following conditions:
 * intra-queue preemption is needed in order for a new application to receive resources
 * The first NodeManager checked for preemption already has idle resources greater than the required resources
 * Containers belonging to a different queue that is using no more than its fair share are running on that node

 

Illustration using a single node cluster for simplicity
{noformat}
yarn.nodemanager.resource.memory-mb = 9216
yarn.nodemanager.resource.cpu-vcores = 18
yarn.scheduler.fair.preemption = true
yarn.scheduler.fair.preemption.cluster-utilization-threshold = 0.5
{noformat}
 

FairScheduler config

 
{code:java}
<allocations>
...
        <queue name="default">
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
        </queue>
        <queue name="limited">
            <maxResources>memory-mb=33.0%, vcores=33.0%</maxResources>
            <weight>1.0</weight>
            <schedulingPolicy>drf</schedulingPolicy>
        </queue>
    <defaultFairSharePreemptionTimeout>5</defaultFairSharePreemptionTimeout>
    <defaultFairSharePreemptionThreshold>1.0</defaultFairSharePreemptionThreshold>
    <defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
...
</allocations>
{code}
 

 

Procedure:
 # Launch an application (app1) in root.limited which will consume the max resources
 # Launch an application (app2) in root.default which will consume no more than the queue's fair share
 # Launch another application (app3) in root.limited with container size smaller than the remaining cluster capacity

 

Expected result:

Resources from app1 should be preempted and provided to app3 until app3 has its fair share.

 

In actuality, this does not always happen. When {{FSPreemptionThread}} iterates over the containers on the node, if the first container belongs to app2, it will not be eligible for preemption (as app2 would go below its fair share). Because the node already had enough capacity for the new container, the next container in the list is not checked and an empty {{PreemptableContainers}} is returned. The list contains no AM containers, so in a multinode scenario, no other nodes will be checked either. No container will be preempted, and until the usage scenario changes, app3 is unable to obtain its fair share of resources.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org