You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Gunnar Wagenknecht <gu...@wagenknecht.org> on 2011/06/21 09:14:31 UTC

ZooKeeper Clients waiting forever (hanging threads)

Hi,

I have an issue with ZK clients waiting forever. The stack for the
waiting threads looks like the following.

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a019b0>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1241)
>  at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
>  ...


Look at the stack further I noticed many more threads hung. All with a
similar call stack (but different client calls, though).

> java.lang.Thread.State: WAITING (on object monitor)
>  at java.lang.Object.wait(Native Method)
>  at java.lang.Object.wait(Object.java:485)
>  at org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1317)
>  - locked <0x00002aab19a013a8>
>    (a org.apache.zookeeper.ClientCnxn$Packet)
>  at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:804)
>  ...

Looking at the logs, it seems that all this started with a connection
loss during nights.

03:31:32.855 [Worker-76] WARN  ... KeeperErrorCode = ConnectionLoss ...
03:31:32.867 [Worker-65] WARN  ... KeeperErrorCode = ConnectionLoss ...

However, then I found this:

03:32:49.417 [ZooKeeper Gate Connect Thread-SendThread(zk-03:2181)]
ERROR org.apache.zookeeper.ClientCnxn - from ZooKeeper Gate Connect
Thread-SendThread(zk-03:2181)
java.lang.OutOfMemoryError: Java heap space
        at java.util.HashMap.resize(HashMap.java:462) ~[na:1.6.0_24]
        at java.util.HashMap.addEntry(HashMap.java:755) ~[na:1.6.0_24]
        at java.util.HashMap.put(HashMap.java:385) ~[na:1.6.0_24]
        at java.util.HashSet.add(HashSet.java:200) ~[na:1.6.0_24]
        at
java.util.AbstractCollection.addAll(AbstractCollection.java:305)
~[na:1.6.0_24]
        at
org.apache.zookeeper.ZooKeeper$ZKWatchManager.materialize(ZooKeeper.java:165)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$EventThread.queueEvent(ClientCnxn.java:474)
~[na:na]
        at
org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1172)
~[na:na]


I was wondering if this may have caused any race condition in the ZK client?

-Gunnar

-- 
Gunnar Wagenknecht
gunnar@wagenknecht.org
http://wagenknecht.org/