You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2015/08/21 19:47:46 UTC

[jira] [Created] (HBASE-14284) In TRUNK, AsyncRpcClient does not timeout; hangs TestDistributedLogReplay, etc.

stack created HBASE-14284:
-----------------------------

             Summary: In TRUNK, AsyncRpcClient does not timeout; hangs TestDistributedLogReplay, etc.
                 Key: HBASE-14284
                 URL: https://issues.apache.org/jira/browse/HBASE-14284
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack


TestDistributedLogReplay puts up regionservers with *40* priority handlers each. This makes for TDLR running with many hundreds of threads. Trying to figure why 40, I see the test can hang if less with all client use stuck never timing out:

{code}
"RS:2;localhost:58498" prio=5 tid=0x00007fd284d4e800 nid=0x416af in Object.wait() [0x000000012952e000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(Native Method)
	at java.lang.Object.wait(Object.java:461)
	at io.netty.util.concurrent.DefaultPromise.await0(DefaultPromise.java:355)
	- locked <0x00000007dff93ea0> (a org.apache.hadoop.hbase.ipc.AsyncCall)
	at io.netty.util.concurrent.DefaultPromise.await(DefaultPromise.java:266)
	at io.netty.util.concurrent.AbstractFuture.get(AbstractFuture.java:42)
	at org.apache.hadoop.hbase.ipc.AsyncRpcClient.call(AsyncRpcClient.java:231)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:214)
	at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:288)
	at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$BlockingStub.regionServerReport(RegionServerStatusProtos.java:8994)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:1148)
	at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:957)
	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.runRegionServer(MiniHBaseCluster.java:156)
	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.access$000(MiniHBaseCluster.java:108)
	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer$1.run(MiniHBaseCluster.java:140)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:356)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1594)
	at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:279)
	at org.apache.hadoop.hbase.MiniHBaseCluster$MiniHBaseClusterRegionServer.run(MiniHBaseCluster.java:138)
	at java.lang.Thread.run(Thread.java:744)

{code}

We  never recover.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)