You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2016/03/02 02:36:18 UTC

[jira] [Updated] (YARN-3414) FairScheduler's preemption may cause livelock

     [ https://issues.apache.org/jira/browse/YARN-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Karthik Kambatla updated YARN-3414:
-----------------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: YARN-4752

> FairScheduler's preemption may cause livelock
> ---------------------------------------------
>
>                 Key: YARN-3414
>                 URL: https://issues.apache.org/jira/browse/YARN-3414
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: fairscheduler
>    Affects Versions: 2.6.0
>            Reporter: Peng Zhang
>
> I met this problem in our cluster, it cause livelock during preemption and scheduling.
> Queue hierarchy described as below:
> {noformat}
>                       root
>               /        |        \
>           queue-1    queue-2    queue-3     
>           /    \
> queue-1-1      queue-1-2
> {noformat}
> # Assume cluster resource is 100G in memory
> # Assume queue-1 has max resource limit 20G
> # queue-1-1 is active and it will get max 20G memory(equal to its fairshare)
> # queue-2 is active then, and it require 30G memory(less than its fairshare)
> # queue-3 is active, and it can be assigned with all other resources, 50G memory(larger than its fairshare). At here three queues' fair share is (20, 40, 40), and usage is (20, 30, 50)
> # queue-1-2 is active, it will cause new preemption request(10G memory and intuitively it can only preempt from its sibling queue-1-1)
> # Actually preemption starts from root, and it will find queue-3 is most over fairshare, and preempt some resources form queue-3.
> # But during scheduling, it will find queue-1 itself arrived it's max fairshare, and cannot assign resource to it. Then resource's again assigned to queue-3
> And then it repeats between last two steps.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)