You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Jonathan Hurley (JIRA)" <ji...@apache.org> on 2015/01/28 06:47:34 UTC

[jira] [Commented] (AMBARI-9368) Deadlock Between Dependent Cluster/Service/Component/Host Implementations

    [ https://issues.apache.org/jira/browse/AMBARI-9368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294738#comment-14294738 ] 

Jonathan Hurley commented on AMBARI-9368:
-----------------------------------------

@Greg - The thread dumps above show some locking in code that doesn't exist anymore or isn't invoked in that manner anymore. The deadlocks that we were seeing involved the inter-relationships between service components and hosts. In my thread dump, qtp572501352-34 tries to save a ServiceComponentHostImpl. During the course of saving it, a write lock on the ServiceComponentHostImpl is acquired. At the same time, qtp572501352-104 tries to read from ServiceComponentImpl; it acquires the read lock, but is then blocked since ServiceComponentImpl also reads ServiceComponentHostImpl. So, now qtp572501352-104 is waiting for the read lock to ServiceComponentHostImpl.

qtp572501352-34 is in the middle of writing to ServiceComponentHostImpl when a refresh is called on its parent, ServiceComponentImpl, which needs to wait on acquired the write lock since qtp572501352-104 is holding it.

My patch will address this by performing fewer read locks and write locks on static data and only locking on the cluster global lock when the cluster is potentially changing.

> Deadlock Between Dependent Cluster/Service/Component/Host Implementations
> -------------------------------------------------------------------------
>
>                 Key: AMBARI-9368
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9368
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.6.1
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: jstack.29096, monitor_lock-1-pid10099.txt, monitor_lock-2-pid10099.txt, monitor_lock-3-pid10099.txt
>
>
> Looks like a textbook deadlock. Why jstack doesn't report it, I don't know.
> Call Hierarchy
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
>     ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
>   
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>     ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> {code} 
>    
> Deadlock Order
> {code}
> qtp572501352-104
>   ServiceComponentImpl.convertToResponse readWriteLock.readLock().lock() ACQUIRED
> qtp572501352-34
>   ServiceComponentHostImpl.persist() writeLock.lock() ACQUIRED
>   ServiceComponentImpl.refresh()  readWriteLock.writeLock() BLOCKED
> qtp572501352-104
>   ServiceComponentHostImpl.getState() readLock.lock() BLOCKED
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)