You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "ben yang (Jira)" <ji...@apache.org> on 2022/06/21 04:39:00 UTC

[jira] [Updated] (YARN-11191) Global Scheduler refreshQueue cause deadLock

     [ https://issues.apache.org/jira/browse/YARN-11191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ben yang updated YARN-11191:
----------------------------
    Description: 
This is a potential bug may impact all open premmption  cluster.In our current version with preemption enabled, the capacityScheduler will call the refreshQueue method of the PreemptionManager when it refreshQueue. This process hold the preemptionManager write lock and  require csqueue read lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock and require PreemptionManager ReadLock.

There is a possibility of deadlock at this time.Because readlock has one rule on unfair policy, when a lock is already occupied by a read lock and the first request in the lock competition queue is a write lock request,other read lock requests cann‘t acquire the lock.

So the potential deadlock is:
{code:java}
CapacityScheduler.refreshQueue: hold: RMSchduler.writeLock、PremmptionManager.writeLock
                                require: csqueue.readLock
CapacityScheduler.schedule: hold: csqueue.readLock
                            require: PremmptionManager.readLock
other thread(completeContainer etc.): require: csqueue.writeLock 

{code}

> Global Scheduler refreshQueue cause deadLock 
> ---------------------------------------------
>
>                 Key: YARN-11191
>                 URL: https://issues.apache.org/jira/browse/YARN-11191
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacity scheduler
>    Affects Versions: 2.9.0, 3.3.0
>            Reporter: ben yang
>            Priority: Major
>
> This is a potential bug may impact all open premmption  cluster.In our current version with preemption enabled, the capacityScheduler will call the refreshQueue method of the PreemptionManager when it refreshQueue. This process hold the preemptionManager write lock and  require csqueue read lock.Meanwhile,ParentQueue.canAssignToThisQueue will hold csqueue readLock and require PreemptionManager ReadLock.
> There is a possibility of deadlock at this time.Because readlock has one rule on unfair policy, when a lock is already occupied by a read lock and the first request in the lock competition queue is a write lock request,other read lock requests cann‘t acquire the lock.
> So the potential deadlock is:
> {code:java}
> CapacityScheduler.refreshQueue: hold: RMSchduler.writeLock、PremmptionManager.writeLock
>                                 require: csqueue.readLock
> CapacityScheduler.schedule: hold: csqueue.readLock
>                             require: PremmptionManager.readLock
> other thread(completeContainer etc.): require: csqueue.writeLock 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org