You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@curator.apache.org by "Amir Gur (JIRA)" <ji...@apache.org> on 2015/03/24 10:45:52 UTC

[jira] [Updated] (CURATOR-194) Deadlock in ConnectionState.checkTimeouts

     [ https://issues.apache.org/jira/browse/CURATOR-194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Amir Gur updated CURATOR-194:
-----------------------------
    Summary: Deadlock in ConnectionState.checkTimeouts  (was: Deadlock ConnectionState.checkTimeouts)

> Deadlock in ConnectionState.checkTimeouts
> -----------------------------------------
>
>                 Key: CURATOR-194
>                 URL: https://issues.apache.org/jira/browse/CURATOR-194
>             Project: Apache Curator
>          Issue Type: Bug
>          Components: Client
>    Affects Versions: 2.6.0
>            Reporter: Amir Gur
>
> When ConnectionState.checkTimeouts actually detects a timeout, it calls 'reset'  
> which calls org.apache.zookeeper.ClientCnxn.close, which sends a ZooDefs.OpCode.closeSession request.
> Then it waits on the packet, until SendThread calls 'notifyAll' on the packet.
> At that time, SendThread is blocked because it tries to enter the synchronized method 'ConnectionState.checkTimeouts'.
> So it will never notify the packet.
> Here is the thread dump:
> "job-scheduler_Worker-19-CheckHealthTask" prio=10 tid=0x00007f260609c000 nid=0x5a97 in Object.wait() [0x00007f25723e1000]
>    java.lang.Thread.State: WAITING (on object monitor)
>         at java.lang.Object.wait(Native Method)
>         - waiting on <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
>         at java.lang.Object.wait(Object.java:503)
>         at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1342)
>         - locked <0x0000000725fc0580> (a org.apache.zookeeper.ClientCnxn$Packet)
>         at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1314)
>         at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:677)
>         - locked <0x0000000723949c88> (a org.apache.zookeeper.ZooKeeper)
>         at org.apache.curator.HandleHolder.internalClose(HandleHolder.java:139)
>         at org.apache.curator.HandleHolder.closeAndReset(HandleHolder.java:77)
>         at org.apache.curator.ConnectionState.reset(ConnectionState.java:218)
>         - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
>         at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:194)
>         - locked <0x000000071651de48> (a org.apache.curator.ConnectionState)
>         at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:474)
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:172)
>         at org.apache.curator.framework.imps.ExistsBuilderImpl$2.call(ExistsBuilderImpl.java:161)
>         at org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.pathInForeground(ExistsBuilderImpl.java:157)
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:148)
>         at org.apache.curator.framework.imps.ExistsBuilderImpl.forPath(ExistsBuilderImpl.java:36)
>         at com.alu.dal.zooKeeper.ZooKeeperSession.checkHealth(ZooKeeperSession.java:350)
>         at com.alu.dal.zooKeeper.ZooKeeperSession.check(ZooKeeperSession.java:86)
>         at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkQuorum(ClusterInstanceServiceImpl.java:464)
>         at com.alu.orchestration.cluster.ClusterInstanceServiceImpl.checkHealthState(ClusterInstanceServiceImpl.java:400)
>         at com.alu.tasks.health.CheckHealthTaskImpl.doWork(CheckHealthTaskImpl.java:37)
>         at com.alu.scheduler.JobSchedulerDetails$QuartzJob.executeInternal(JobSchedulerDetails.java:95)
>         at org.springframework.scheduling.quartz.QuartzJobBean.execute(QuartzJobBean.java:114)
>         at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
>         at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
> "localhost-startStop-1-SendThread(11.1.1.11:2181)" daemon prio=10 tid=0x00007f257c61a000 nid=0x7c3 waiting for monitor entry [0x00007f2562e65000]
>    java.lang.Thread.State: BLOCKED (on object monitor)
>         at org.apache.curator.ConnectionState.checkTimeouts(ConnectionState.java:177)
>         - waiting to lock <0x000000071651de48> (a org.apache.curator.ConnectionState)
>         at org.apache.curator.ConnectionState.getZooKeeper(ConnectionState.java:88)
>         at org.apache.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:115)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.performBackgroundOperation(CuratorFrameworkImpl.java:793)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.doSyncForSuspendedConnection(CuratorFrameworkImpl.java:668)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.access$800(CuratorFrameworkImpl.java:58)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl$7.retriesExhausted(CuratorFrameworkImpl.java:664)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.checkBackgroundRetry(CuratorFrameworkImpl.java:683)
>         at org.apache.curator.framework.imps.CuratorFrameworkImpl.processBackgroundOperation(CuratorFrameworkImpl.java:496)
>         at org.apache.curator.framework.imps.BackgroundSyncImpl$1.processResult(BackgroundSyncImpl.java:50)
>         at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:609)
>         at org.apache.zookeeper.ClientCnxn$EventThread.queuePacket(ClientCnxn.java:478)
>         - locked <0x0000000714935b18> (a java.util.concurrent.LinkedBlockingQueue)
>         at org.apache.zookeeper.ClientCnxn.finishPacket(ClientCnxn.java:630)
>         at org.apache.zookeeper.ClientCnxn.conLossPacket(ClientCnxn.java:648)
>         at org.apache.zookeeper.ClientCnxn.access$2400(ClientCnxn.java:85)
>         at org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1194)
>         - locked <0x000000071b205bf0> (a java.util.LinkedList)
>         at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1122)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)