You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2013/10/14 11:21:46 UTC

[jira] [Resolved] (SOLR-5215) Deadlock in Solr Cloud ConnectionManager

     [ https://issues.apache.org/jira/browse/SOLR-5215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-5215.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s:     (was: 4.6)
                   5.0
                   4.5

This fix was released in 4.5

> Deadlock in Solr Cloud ConnectionManager
> ----------------------------------------
>
>                 Key: SOLR-5215
>                 URL: https://issues.apache.org/jira/browse/SOLR-5215
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java, SolrCloud
>    Affects Versions: 4.2.1
>         Environment: Linux 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
> java version "1.6.0_18"
> Java(TM) SE Runtime Environment (build 1.6.0_18-b07)
> Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
>            Reporter: Ricardo Merizalde
>            Assignee: Mark Miller
>             Fix For: 4.5, 5.0
>
>         Attachments: SOLR-5215.patch
>
>
> We are constantly seeing a deadlocks in our production application servers.
> The problem seems to be that a thread A:
> - tries to process an event and acquires the ConnectionManager lock
> - the update callback acquires connectionUpdateLock and invokes waitForConnected
> - waitForConnected tries to acquire the ConnectionManager lock (which already has)
> - waitForConnected calls wait and release the ConnectionManager lock (but still has the connectionUpdateLock)
> The a thread B:
> - tries to process an event and acquires the ConnectionManager lock
> - the update call back tries to acquire connectionUpdateLock but gets blocked holding the ConnectionManager lock and preventing thread A from getting out of the wait state.
>  
> Here is part of the thread dump:
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x0000000059965800 nid=0x3e81 waiting for monitor entry [0x0000000057169000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
>         - waiting to lock <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>         
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x000000005ad40000 nid=0x3e67 waiting for monitor entry [0x000000004dbd4000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
>         - waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>         
> "http-0.0.0.0-8080-82-EventThread" daemon prio=10 tid=0x00002aac4c2f7000 nid=0x3d9a waiting for monitor entry [0x0000000042821000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
>         - locked <0x00002aab1b0e0f78> (a java.lang.Object)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
>         
> Found one Java-level deadlock:
> =============================
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x00002aac4c314978 (object 0x00002aab1b0e0f78, a java.lang.Object),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
> "http-0.0.0.0-8080-82-EventThread":
>   waiting to lock monitor 0x000000005c7694b0 (object 0x00002aab1b0e0ce0, a org.apache.solr.common.cloud.ConnectionManager),
>   which is held by "http-0.0.0.0-8080-82-EventThread"
>   
>   
> Java stack information for the threads listed above:
> ===================================================
> "http-0.0.0.0-8080-82-EventThread":
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:71)
>         - waiting to lock <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> "http-0.0.0.0-8080-82-EventThread":
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
>         - waiting to lock <0x00002aab1b0e0f78> (a java.lang.Object)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495)
> "http-0.0.0.0-8080-82-EventThread":
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.solr.common.cloud.ConnectionManager.waitForConnected(ConnectionManager.java:165)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.solr.common.cloud.ConnectionManager$1.update(ConnectionManager.java:98)
>         - locked <0x00002aab1b0e0f78> (a java.lang.Object)
>         at org.apache.solr.common.cloud.DefaultConnectionStrategy.reconnect(DefaultConnectionStrategy.java:46)
>         at org.apache.solr.common.cloud.ConnectionManager.process(ConnectionManager.java:91)
>         - locked <0x00002aab1b0e0ce0> (a org.apache.solr.common.cloud.ConnectionManager)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:519)
>         at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:495) 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org