You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2016/03/02 02:36:18 UTC
[jira] [Updated] (YARN-3414) FairScheduler's preemption may cause
livelock
[ https://issues.apache.org/jira/browse/YARN-3414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karthik Kambatla updated YARN-3414:
-----------------------------------
Issue Type: Sub-task (was: Bug)
Parent: YARN-4752
> FairScheduler's preemption may cause livelock
> ---------------------------------------------
>
> Key: YARN-3414
> URL: https://issues.apache.org/jira/browse/YARN-3414
> Project: Hadoop YARN
> Issue Type: Sub-task
> Components: fairscheduler
> Affects Versions: 2.6.0
> Reporter: Peng Zhang
>
> I met this problem in our cluster, it cause livelock during preemption and scheduling.
> Queue hierarchy described as below:
> {noformat}
> root
> / | \
> queue-1 queue-2 queue-3
> / \
> queue-1-1 queue-1-2
> {noformat}
> # Assume cluster resource is 100G in memory
> # Assume queue-1 has max resource limit 20G
> # queue-1-1 is active and it will get max 20G memory(equal to its fairshare)
> # queue-2 is active then, and it require 30G memory(less than its fairshare)
> # queue-3 is active, and it can be assigned with all other resources, 50G memory(larger than its fairshare). At here three queues' fair share is (20, 40, 40), and usage is (20, 30, 50)
> # queue-1-2 is active, it will cause new preemption request(10G memory and intuitively it can only preempt from its sibling queue-1-1)
> # Actually preemption starts from root, and it will find queue-3 is most over fairshare, and preempt some resources form queue-3.
> # But during scheduling, it will find queue-1 itself arrived it's max fairshare, and cannot assign resource to it. Then resource's again assigned to queue-3
> And then it repeats between last two steps.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)