You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Wangda Tan (JIRA)" <ji...@apache.org> on 2014/12/05 02:31:12 UTC

[jira] [Commented] (YARN-2925) Internal fields in LeafQueue access should be protected when accessed from FiCaSchedulerApp to calculate Headroom

    [ https://issues.apache.org/jira/browse/YARN-2925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14234898#comment-14234898 ] 

Wangda Tan commented on YARN-2925:
----------------------------------

We cannot simply add a synchronized modifier to internal fields used to get user-limit and headroom, it will lead to deadlock:
Assume:
- Thread 1 is CS's message handler, it process a node's heartbeat and trying to allocate some containers. It will acquires LeafQueue's synchronized lock first, then acquires corresponding FiCaScheduler's synchronized lock
- Thread 2 is ApplicationMasterService.allocate, it will all CS.allocate, first will acquires FiCaScheduler's synchronized lock, then it will acquires LeafQueue's synchronized
Thread 1/2 will be deadlock after then.

Basically, we have two choices to solve this problem and avoid deadlock mentioned above,
- Adding synchronized modifier to CapacityScheduler.allocate, that writing operations to LeafQueue will be protected by CapacityScheduler lock. But according to read world use case, CapacityScheduler.allocate will be called by all application between a short period, lock whole CS seems too inefficiency here.
- Adding a fine-grained lock in LeafQueue, only protect resource/capacity related fields. With this, fields could be protected and CS lock will be avoided altogether, so I prefer to do the 2nd way. 

> Internal fields in LeafQueue access should be protected when accessed from FiCaSchedulerApp to calculate Headroom
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: YARN-2925
>                 URL: https://issues.apache.org/jira/browse/YARN-2925
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Critical
>
> Upon YARN-2644, FiCaScheduler will calculation up-to-date headroom before sending back Allocation response to AM.
> Headroom calculation is happened in LeafQueue side, uses fields like used resource, etc. But it is not protected by any lock of LeafQueue, so it might be corrupted is someone else is editing it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)