You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Dmitry (Jira)" <ji...@apache.org> on 2022/06/23 01:19:00 UTC

[jira] [Created] (YARN-11194) FairShare preemption doesn't enforce fairness between sibling in some cases

Dmitry created YARN-11194:
-----------------------------

             Summary: FairShare preemption doesn't enforce fairness between sibling in some cases
                 Key: YARN-11194
                 URL: https://issues.apache.org/jira/browse/YARN-11194
             Project: Hadoop YARN
          Issue Type: Bug
          Components: fairscheduler, scheduler preemption
    Affects Versions: 3.2.1
         Environment: hadoop yarn 3.2.1
            Reporter: Dmitry


Queues hierarchy:

root (cluster: 30GB, 30 vcores)
 * q1 (maxResources: 10GB, 10 vcores)
 ** q1.1 (weight: 1)
 ** q1.2 (weight: 9)
 * q2
 * q3

 

Steps:
 # app1 with a demand 100GB/100 vcores is added to q1.1 and gets 10GB/10 vcores
 ## q1 reaches it's max
 # app2 with a demand 1000GB/1000 vcores is added to q2, it gets 20GB/20 vcores
 ## cluster runs at 100% capacity now
 # app3 with demand 100GB/100 vcores is added to q1.2

{*}Expected{*}: fair share preemption preempts container so app3 (q1.2) gets 9GB/9 vcores. It needs to preempt from app1 (q1.1) so q1 doesn't exceed max resources.

{*}Observed{*}: app3 is starving

Some observations:
 # We see some preemption happening from app2 (q2) that matches app3 starvation (9GB/9 vcores in this case). It may suggest app2 preempts from app3 but can't use preempted containers due to this [check|https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/main/java/org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FSAppAttempt.java#L1098]
 # Eliminating max on q1 helps to resolve the issue

Notes:
 # this is oversimplified version of our production set up. I can provide more details if needed.
 # I have a heap dump of the issue that I can't share due because of our policy, but I can look up some state if needed.

 

Thanks!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org