You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zookeeper.apache.org by Patrick Hunt <ph...@apache.org> on 2010/08/12 19:00:52 UTC

Re: zookeeper seems to hang

Great bug report Ted, the stack trace in particular is very useful.

It looks like a timing bug where the client is not shutting down cleanly 
on the close call. I reviewed the code in question but nothing pops out 
at me. Also the logs just show us shutting down, nothing else from zk in 
there.

Create a jira and attach all the detail you have available.

Patrick

On 08/11/2010 03:21 PM, Ted Yu wrote:
> Hi,
> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
> Regionserver
> process was shutting down and seemed to hang.
>
> Here is the bottom of region server log:
> http://pastebin.com/YYawJ4jA
>
> zookeeper-3.2.2 is used.
>
> Your comment is welcome.
>
> Here is relevant portion from jstack - I attempted to attach jstack twice in
> my email to dev@hbase.apache.org but failed:
>
> "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on
> condition [0x0000000000000000]
>     java.lang.Thread.State: RUNNABLE
>
> "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000 nid=0x6c81
> in Object.wait() [0x0000000043755000]
>     java.lang.Thread.State: WAITING (on object monitor)
>          at java.lang.Object.wait(Native Method)
>          - waiting on<0x00002aaab76633c0>  (a
> org.apache.zookeeper.ClientCnxn$Packet)
>          at java.lang.Object.wait(Object.java:485)
>          at
> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>          - locked<0x00002aaab76633c0>  (a
> org.apache.zookeeper.ClientCnxn$Packet)
>          at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>          at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>          - locked<0x00002aaabf5e0c30>  (a org.apache.zookeeper.ZooKeeper)
>          at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>          at
> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>          at java.lang.Thread.run(Thread.java:619)
>
> "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80 waiting
> on condition [0x00000000413f3000]
>     java.lang.Thread.State: WAITING (parking)
>          at sun.misc.Unsafe.park(Native Method)
>          - parking to wait for<0x00002aaabf6e9150>  (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>          at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>          at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>          at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>          at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
>
> "RMI TCP Accept-0" daemon prio=10 tid=0x00002aabb822c800 nid=0x6c7d runnable
> [0x0000000040752000]
>     java.lang.Thread.State: RUNNABLE
>          at java.net.PlainSocketImpl.socketAccept(Native Method)
>          at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
>          - locked<0x00002aaabf585578>  (a java.net.SocksSocketImpl)
>          at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>          at java.net.ServerSocket.accept(ServerSocket.java:421)
>          at
> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
>          at
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
>          at
> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>          at java.lang.Thread.run(Thread.java:619)
>

Re: zookeeper seems to hang

Posted by Ted Yu <yu...@gmail.com>.
Please see:
https://issues.apache.org/jira/browse/ZOOKEEPER-846

On Thu, Aug 12, 2010 at 10:00 AM, Patrick Hunt <ph...@apache.org> wrote:

> Great bug report Ted, the stack trace in particular is very useful.
>
> It looks like a timing bug where the client is not shutting down cleanly on
> the close call. I reviewed the code in question but nothing pops out at me.
> Also the logs just show us shutting down, nothing else from zk in there.
>
> Create a jira and attach all the detail you have available.
>
> Patrick
>
>
> On 08/11/2010 03:21 PM, Ted Yu wrote:
>
>> Hi,
>> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
>> Regionserver
>> process was shutting down and seemed to hang.
>>
>> Here is the bottom of region server log:
>> http://pastebin.com/YYawJ4jA
>>
>> zookeeper-3.2.2 is used.
>>
>> Your comment is welcome.
>>
>> Here is relevant portion from jstack - I attempted to attach jstack twice
>> in
>> my email to dev@hbase.apache.org but failed:
>>
>> "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on
>> condition [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>> "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000
>> nid=0x6c81
>> in Object.wait() [0x0000000043755000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>         at java.lang.Object.wait(Native Method)
>>         - waiting on<0x00002aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>>         at java.lang.Object.wait(Object.java:485)
>>         at
>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>>         - locked<0x00002aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>>         at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>>         at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>>         - locked<0x00002aaabf5e0c30>  (a org.apache.zookeeper.ZooKeeper)
>>         at
>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>>         at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80
>> waiting
>> on condition [0x00000000413f3000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for<0x00002aaabf6e9150>  (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>         at
>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>         at
>>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
>>
>> "RMI TCP Accept-0" daemon prio=10 tid=0x00002aabb822c800 nid=0x6c7d
>> runnable
>> [0x0000000040752000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
>>         - locked<0x00002aaabf585578>  (a java.net.SocksSocketImpl)
>>         at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>>         at java.net.ServerSocket.accept(ServerSocket.java:421)
>>         at
>>
>> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
>>         at
>>
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
>>         at
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>>

Re: zookeeper seems to hang

Posted by Ted Yu <yu...@gmail.com>.
Please see:
https://issues.apache.org/jira/browse/ZOOKEEPER-846

On Thu, Aug 12, 2010 at 10:00 AM, Patrick Hunt <ph...@apache.org> wrote:

> Great bug report Ted, the stack trace in particular is very useful.
>
> It looks like a timing bug where the client is not shutting down cleanly on
> the close call. I reviewed the code in question but nothing pops out at me.
> Also the logs just show us shutting down, nothing else from zk in there.
>
> Create a jira and attach all the detail you have available.
>
> Patrick
>
>
> On 08/11/2010 03:21 PM, Ted Yu wrote:
>
>> Hi,
>> Using HBase 0.20.6 (with HBASE-2473) we encountered a situation where
>> Regionserver
>> process was shutting down and seemed to hang.
>>
>> Here is the bottom of region server log:
>> http://pastebin.com/YYawJ4jA
>>
>> zookeeper-3.2.2 is used.
>>
>> Your comment is welcome.
>>
>> Here is relevant portion from jstack - I attempted to attach jstack twice
>> in
>> my email to dev@hbase.apache.org but failed:
>>
>> "DestroyJavaVM" prio=10 tid=0x00002aabb849c800 nid=0x6c60 waiting on
>> condition [0x0000000000000000]
>>    java.lang.Thread.State: RUNNABLE
>>
>> "regionserver/10.32.42.245:60020" prio=10 tid=0x00002aabb84ce000
>> nid=0x6c81
>> in Object.wait() [0x0000000043755000]
>>    java.lang.Thread.State: WAITING (on object monitor)
>>         at java.lang.Object.wait(Native Method)
>>         - waiting on<0x00002aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>>         at java.lang.Object.wait(Object.java:485)
>>         at
>> org.apache.zookeeper.ClientCnxn.submitRequest(ClientCnxn.java:1099)
>>         - locked<0x00002aaab76633c0>  (a
>> org.apache.zookeeper.ClientCnxn$Packet)
>>         at org.apache.zookeeper.ClientCnxn.close(ClientCnxn.java:1077)
>>         at org.apache.zookeeper.ZooKeeper.close(ZooKeeper.java:505)
>>         - locked<0x00002aaabf5e0c30>  (a org.apache.zookeeper.ZooKeeper)
>>         at
>>
>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.close(ZooKeeperWrapper.java:681)
>>         at
>>
>> org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:654)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>> "main-EventThread" daemon prio=10 tid=0x0000000043474000 nid=0x6c80
>> waiting
>> on condition [0x00000000413f3000]
>>    java.lang.Thread.State: WAITING (parking)
>>         at sun.misc.Unsafe.park(Native Method)
>>         - parking to wait for<0x00002aaabf6e9150>  (a
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>>         at
>> java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
>>         at
>>
>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1987)
>>         at
>>
>> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:399)
>>         at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:414)
>>
>> "RMI TCP Accept-0" daemon prio=10 tid=0x00002aabb822c800 nid=0x6c7d
>> runnable
>> [0x0000000040752000]
>>    java.lang.Thread.State: RUNNABLE
>>         at java.net.PlainSocketImpl.socketAccept(Native Method)
>>         at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:390)
>>         - locked<0x00002aaabf585578>  (a java.net.SocksSocketImpl)
>>         at java.net.ServerSocket.implAccept(ServerSocket.java:453)
>>         at java.net.ServerSocket.accept(ServerSocket.java:421)
>>         at
>>
>> sun.management.jmxremote.LocalRMIServerSocketFactory$1.accept(LocalRMIServerSocketFactory.java:34)
>>         at
>>
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.executeAcceptLoop(TCPTransport.java:369)
>>         at
>> sun.rmi.transport.tcp.TCPTransport$AcceptLoop.run(TCPTransport.java:341)
>>         at java.lang.Thread.run(Thread.java:619)
>>
>>