You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 聪聪 <17...@qq.com> on 2015/10/26 14:28:58 UTC

回复: 回复: Hbase cluster is suddenly unable to respond

Thank you very much!


------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue. 

I suggest upgrading to newer release such as 1.1.2 or 0.98.15

Cheers

> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> 
> hi,Ted:
> 
> 
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver  then hbase back to normal. Is it possible  bug  HBASE-11277?
> 
> 
> Regionserver on the log basically almost  this warn log
> mater on the log  is as follows:
> 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 
> 
> 主题: Re: Hbase cluster is suddenly unable to respond
> 
> 
> 
> Were other region servers functioning normally around 17:33 ?
> 
> Which hbase release are you using ?
> 
> Can you pastebin more of the region server log ?
> 
> Thanks
> 
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>> 
>> hi,all:
>> 
>> 
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>> 
>> 
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)

Re: Hbase cluster is suddenly unable to respond

Posted by 吴国泉wgq <wg...@qunar.com>.
hi all:
       There are two conditions caused the region server crashed (we met).
       1.NIO,  out of  direct memory
       2.zookeeper session timeout

       You can find the reason in the region server log (or .out) or GC log.

       If it is the "out of  direct memory",you will see “ kill -9 XXX” in the regionserverXX.out .
       Then change the “-XX:+DisableExplicitGC” to  "-XX:+ExplicitGCInvokesConcurrent”


       If it is the “zookeeper session timeout”, you can see “XX seconds timeout to zookeeper” in the regionserverXX.log.
       Here is the point: The ‘zookeeper.session.timeout’  in hbase-site.xml  does not  work  if you use a  zookeeper  is not managed by hbase.
       “maxSessionTimeout” in the zoo.cfg is the really property controls the timeout.  default it is 40s. That may be the reason why ‘zookeeper.session.timeout’ does not work.

      GC can stop the world,You can optical the GC or change ‘maxSessionTimeout’ bigger to make sure  Hbase won’t shutdown the region server after an acceptable timeout.





在 2016年8月6日,下午2:08,kiran <ki...@gmail.com>> 写道:

We are also facing the same issue. Please tell us what is the solution. I
have increased the rpc timeout and caching is reduced but with no effect.
We are using hbase 0.98.7. Please suggest a work around as we are facing
the issue very frequently now and we are having downtime in production.

On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com>> wrote:

There is a view child nodes loop code  http://paste2.org/zm8GE7xH




------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月30日(星期五) 上午10:24
收件人: "user"<us...@hbase.apache.org>>;

主题: 回复: 回复: Hbase cluster is suddenly unable to respond





The client code is http://paste2.org/p3BXkKtV


Is the client version compatible with it?
I see  that the client version is hbase0.96.1.1-hadoop2




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;

主题: Re: 回复: Hbase cluster is suddenly unable to respond



Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=

Do you know how wide the rows returned may get ?

Cheers

On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com>> wrote:

Developers feedback their client has the following error:

[2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020<http://l-hbase28.data.cn8.qunar.com/192.168.44.32:60020> from
tomcat: unexpected exception receiving call responses

java.lang.OutOfMemoryError: Direct buffer memory

   at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]

   at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
~[na:1.6.0_20]

   at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
~[na:1.6.0_20]

   at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
~[na:1.6.0_20]

   at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]

   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
~[na:1.6.0_20]

   at
org.apache.hadoop.net.SocketInputStream$Reader.
performIO(SocketInputStream.java:57)
~[hadoop-common-2.2.0.jar:na]

   at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(
SocketIOWithTimeout.java:142)
~[hadoop-common-2.2.0.jar:na]

   at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
~[hadoop-common-2.2.0.jar:na]

   at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
~[hadoop-common-2.2.0.jar:na]

   at java.io.FilterInputStream.read(FilterInputStream.java:116)
~[na:1.6.0_20]

   at java.io.FilterInputStream.read(FilterInputStream.java:116)
~[na:1.6.0_20]

   at
org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(
RpcClient.java:555)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]

   at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
~[na:1.6.0_20]

   at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
~[na:1.6.0_20]

   at java.io.DataInputStream.read(DataInputStream.java:132)
~[na:1.6.0_20]

   at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
~[hadoop-common-2.2.0.jar:na]

   at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.
readResponse(RpcClient.java:1101)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]

   at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]







------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月29日(星期四) 晚上10:48
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;

主题: Re: 回复: Hbase cluster is suddenly unable to respond



I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
HBASE-11277
.

The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :

message GetRequest {
 required RegionSpecifier region = 1;
 required Get get = 2;
}

Are all your hbase clients running in the same version ?

Cheers

On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com>> wrote:

the regionserver jstack log is    http://paste2.org/yLDJeXgL




------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月29日(星期四) 晚上9:10
收件人: "user"<us...@hbase.apache.org>>;

主题: 回复: 回复: Hbase cluster is suddenly unable to respond



hi Ted:


Yesterday around 14:40,one of regionservers hang once against.At that
time
I saw web ui can not open.Hbase cluster is  unable to respond.Very
anxious,
hoping to get help!


jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
nid=0x12d3 runnable [0x00007f3bebe58000]
  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.NativeThread.current(Native Method)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
   - locked <0x00007f3d27360fb0> (a java.lang.Object)
   - locked <0x00007f3d27360f90> (a java.lang.Object)
   at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
   at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1476)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
   - locked <0x00007f3c584ce990> (a
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
   at

java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)


"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
nid=0x12d2 runnable [0x00007f3bebf59000]
  java.lang.Thread.State: RUNNABLE
   at sun.nio.ch.NativeThread.current(Native Method)
   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
   - locked <0x00007f3d27360530> (a java.lang.Object)
   - locked <0x00007f3d27360510> (a java.lang.Object)
   at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
   at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1476)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
   - locked <0x00007f3c584cf7d8> (a
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
   at

java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)





region server log :
2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion:
Finished
memstore flush of ~3.6 M/3820648, currentsize=536/536 for region

order_history,2801xyz140618175732642$3,1418829598639.
afc853471a8cd4184bc9e7be00b8eea0.
in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 16740356019163164014 number_of_rows: 10
close_scanner:
false next_call_seq: 0
   at

org.apache.hadoop.hbase.regionserver.HRegionServer.
scan(HRegionServer.java:3007)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
callBlockingMethod(ClientProtos.java:26929)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 16740356019163164014 number_of_rows: 10
close_scanner:
false next_call_seq: 0
   at

org.apache.hadoop.hbase.regionserver.HRegionServer.
scan(HRegionServer.java:3007)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
callBlockingMethod(ClientProtos.java:26929)
   at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)

2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing
required fields: region, get
   at

com.google.protobuf.AbstractMessage$Builder.
newUninitializedMessageException(AbstractMessage.java:770)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4474)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4406)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processRequest(RpcServer.java:1689)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processOneRpc(RpcServer.java:1631)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1491)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
   at

java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing
required fields: region, get
   at

com.google.protobuf.AbstractMessage$Builder.
newUninitializedMessageException(AbstractMessage.java:770)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4474)
   at

org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4406)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processRequest(RpcServer.java:1689)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processOneRpc(RpcServer.java:1631)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1491)
   at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
   at

org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
   at

java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:744)



------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>>;

主题: 回复: 回复: Hbase cluster is suddenly unable to respond





Thank you very much!


------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>>;

主题: Re: 回复: Hbase cluster is suddenly unable to respond



The fix from HBASE-11277 may solve your problem - if you collect stack
trace during the hang, we would have more clue.

I suggest upgrading to newer release such as 1.1.2 or 0.98.15

Cheers

On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com>> wrote:

hi,Ted:


I use the HBase version is hbase-0.96.0.
Around 17:33,other region servers also appeared in this warn log.I
don't
know if it's normal or not.At that time I saw web ui can not open.I
restart
the regionserver  then hbase back to normal. Is it possible  bug
HBASE-11277?


Regionserver on the log basically almost  this warn log
mater on the log  is as follows:
2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
merged region(s) and 1 unreferenced parent region(s)
2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
ipc.RpcServer: (responseTooSlow):

{"processingtimems":70266,"call":"ListTableDescriptorsByNamespac
e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
ListTableDescriptorsByNamespaceRequest)","client":"
192.168.39.22:60292

","starttimems":1445593715207,"queuetimems":0,"class":"
HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
ipc.RpcServer: (responseTooSlow):

{"processingtimems":130525,"call":"ListTableDescriptorsByNamespac
e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
ListTableDescriptorsByNamespaceRequest)","client":"
192.168.39.22:60286

","starttimems":1445593654945,"queuetimems":0,"class":"
HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
methodName: ListTableDescriptorsByNamespace size: 48 connection:
192.168.39.22:60292: output error
2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
methodName: ListTableDescriptorsByNamespace size: 48 connection:
192.168.39.22:60286: output error
2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
ClosedChannelException, this means that the server was processing a
request
but the client went away. The error message was: null











------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月23日(星期五) 晚上11:39
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;

主题: Re: Hbase cluster is suddenly unable to respond



Were other region servers functioning normally around 17:33 ?

Which hbase release are you using ?

Can you pastebin more of the region server log ?

Thanks

On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com>> wrote:

hi,all:


This afternoon,The whole Hbase cluster is suddenly unable to
respond.after
I restart a regionserver after,the cluster has recovered.I don't
know
the
cause of the trouble.I hope I can get help from you.


Regionserver on the log is as follows:
2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
wal.FSHLog:
moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>
,60020,1442810406218/l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>
%2C60020%2C1442810406218.1445580462689
whose highest sequenceid is 9071525521 to /hbase/oldWALs/
l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>%2C60020%2C1442810406218.1445580462689
2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
      at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.
read(SocketDispatcher.java:39)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:197)
      at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
      at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
RpcServer.java:2368)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1403)
      at

org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
      at


java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
      at


java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)
2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
      at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
      at sun.nio.ch.SocketDispatcher.
read(SocketDispatcher.java:39)
      at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
      at sun.nio.ch.IOUtil.read(IOUtil.java:197)
      at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
      at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
RpcServer.java:2368)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1403)
      at

org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
      at


org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
      at


java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
      at


java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
      at java.lang.Thread.run(Thread.java:744)






--
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late


安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件


Re: 回复: Hbase cluster is suddenly unable to respond

Posted by Ted Yu <yu...@gmail.com>.
Can you provide jstack of region server(s) ?

Was there anything interesting in the logs ?

Thanks

BTW 0.98.7 is quite old. Please consider upgrading.

On Fri, Aug 5, 2016 at 11:10 PM, kiran <ki...@gmail.com> wrote:

> Hbase client and server are in same version 0.98.7. We are having complete
> downtime of about 30min and high cpu usage in the node and network in the
> cluster.
>
> On Sat, Aug 6, 2016 at 11:38 AM, kiran <ki...@gmail.com>
> wrote:
>
> > We are also facing the same issue. Please tell us what is the solution. I
> > have increased the rpc timeout and caching is reduced but with no effect.
> > We are using hbase 0.98.7. Please suggest a work around as we are facing
> > the issue very frequently now and we are having downtime in production.
> >
> > On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:
> >
> >> There is a view child nodes loop code  http://paste2.org/zm8GE7xH
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "蒲聪-北京";<17...@qq.com>;
> >> 发送时间: 2015年10月30日(星期五) 上午10:24
> >> 收件人: "user"<us...@hbase.apache.org>;
> >>
> >> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >>
> >>
> >> The client code is http://paste2.org/p3BXkKtV
> >>
> >>
> >> Is the client version compatible with it?
> >> I see  that the client version is hbase0.96.1.1-hadoop2
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "Ted Yu";<yu...@gmail.com>;
> >> 发送时间: 2015年10月30日(星期五) 凌晨0:08
> >> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >>
> >> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >> Client side, have they tried increasing direct memory size ?
> >> -XX:MaxDirectMemorySize=
> >>
> >> Do you know how wide the rows returned may get ?
> >>
> >> Cheers
> >>
> >> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> > Developers feedback their client has the following error:
> >> >
> >> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client
> (1904394969)
> >> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> >> > tomcat: unexpected exception receiving call responses
> >> >
> >> > java.lang.OutOfMemoryError: Direct buffer memory
> >> >
> >> >     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
> >> >
> >> >     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
> >> >
> >> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(
> >> SocketInputStream.java:57)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithT
> >> imeout.java:142)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:161)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:131)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputSt
> >> ream.read(RpcClient.java:555)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >     at java.io.BufferedInputStream.read1(BufferedInputStream.
> java:256)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at java.io.DataInputStream.read(DataInputStream.java:132)
> >> > ~[na:1.6.0_20]
> >> >
> >> >     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readRespons
> >> e(RpcClient.java:1101)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >     at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClie
> >> nt.java:721)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ------------------ 原始邮件 ------------------
> >> > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > 发送时间: 2015年10月29日(星期四) 晚上10:48
> >> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >> >
> >> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >> >
> >> >
> >> >
> >> > I took a look at the jstack.
> >> > The threads involving RpcServer$Connection.readAndProcess() were in
> >> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> >> > HBASE-11277
> >> > .
> >> >
> >> > The protobuf exception shown in your earlier email corresponded to the
> >> > following in hbase-protocol/src/main/protobuf/Client.proto :
> >> >
> >> > message GetRequest {
> >> >   required RegionSpecifier region = 1;
> >> >   required Get get = 2;
> >> > }
> >> >
> >> > Are all your hbase clients running in the same version ?
> >> >
> >> > Cheers
> >> >
> >> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> >> >
> >> > > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> >> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > hi Ted:
> >> > >
> >> > >
> >> > > Yesterday around 14:40,one of regionservers hang once against.At
> that
> >> > time
> >> > > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> >> > anxious,
> >> > > hoping to get help!
> >> > >
> >> > >
> >> > > jstack log is as follows:
> >> > > "RpcServer.reader=4,port=60020" daemon prio=10
> tid=0x00007f4466146800
> >> > > nid=0x12d3 runnable [0x00007f3bebe58000]
> >> > >    java.lang.Thread.State: RUNNABLE
> >> > >     at sun.nio.ch.NativeThread.current(Native Method)
> >> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >> > >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >> > >     at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     - locked <0x00007f3c584ce990> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > > "RpcServer.reader=3,port=60020" daemon prio=10
> tid=0x00007f4466145000
> >> > > nid=0x12d2 runnable [0x00007f3bebf59000]
> >> > >    java.lang.Thread.State: RUNNABLE
> >> > >     at sun.nio.ch.NativeThread.current(Native Method)
> >> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >> > >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >> > >     at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     - locked <0x00007f3c584cf7d8> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > region server log :
> >> > > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion:
> >> Finished
> >> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >> > >
> >> > order_history,2801xyz140618175732642$3,1418829598639.afc8534
> >> 71a8cd4184bc9e7be00b8eea0.
> >> > > in 45ms, sequenceid=9599960557, compaction requested=true
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > >
> >> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > >     at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > >     at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > >     at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > >     at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > >     at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > >     at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Thank you very much!
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > The fix from HBASE-11277 may solve your problem - if you collect
> stack
> >> > > trace during the hang, we would have more clue.
> >> > >
> >> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >> > >
> >> > > Cheers
> >> > >
> >> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >> > > >
> >> > > > hi,Ted:
> >> > > >
> >> > > >
> >> > > > I use the HBase version is hbase-0.96.0.
> >> > > > Around 17:33,other region servers also appeared in this warn log.I
> >> > don't
> >> > > know if it's normal or not.At that time I saw web ui can not open.I
> >> > restart
> >> > > the regionserver  then hbase back to normal. Is it possible  bug
> >> > > HBASE-11277?
> >> > > >
> >> > > >
> >> > > > Regionserver on the log basically almost  this warn log
> >> > > > mater on the log  is as follows:
> >> > > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> >> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0
> unreferenced
> >> > > merged region(s) and 1 unreferenced parent region(s)
> >> > > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> >> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":70266,"call":"ListTableDescriptorsByName
> >> space(org.apache.hadoop.hbase.protobuf.generated.MasterProto
> >> s$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60292
> >> > >
> >> > ","starttimems":1445593715207,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":130525,"call":"ListTableDescriptorsByNam
> >> espace(org.apache.hadoop.hbase.protobuf.generated.Maste
> >> rProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60286
> >> > >
> >> > ","starttimems":1445593654945,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60292: output error
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60286: output error
> >> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> >> > > ClosedChannelException, this means that the server was processing a
> >> > request
> >> > > but the client went away. The error message was: null
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > ------------------ 原始邮件 ------------------
> >> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> >> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >> > > >
> >> > > > 主题: Re: Hbase cluster is suddenly unable to respond
> >> > > >
> >> > > >
> >> > > >
> >> > > > Were other region servers functioning normally around 17:33 ?
> >> > > >
> >> > > > Which hbase release are you using ?
> >> > > >
> >> > > > Can you pastebin more of the region server log ?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >> > > >>
> >> > > >> hi,all:
> >> > > >>
> >> > > >>
> >> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
> >> > > respond.after
> >> > > >> I restart a regionserver after,the cluster has recovered.I don't
> >> know
> >> > > the
> >> > > >> cause of the trouble.I hope I can get help from you.
> >> > > >>
> >> > > >>
> >> > > >> Regionserver on the log is as follows:
> >> > > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> >> > wal.FSHLog:
> >> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> >> > > %2C60020%2C1442810406218.1445580462689
> >> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.
> 1445580462689
> >> > > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >>        at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >>        at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >>        at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >>        at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >>        at java.lang.Thread.run(Thread.java:744)
> >> > > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >>        at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >>        at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >>        at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >>        at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >>        at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >>        at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
> >
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>

Re: 回复: Hbase cluster is suddenly unable to respond

Posted by kiran <ki...@gmail.com>.
Hbase client and server are in same version 0.98.7. We are having complete
downtime of about 30min and high cpu usage in the node and network in the
cluster.

On Sat, Aug 6, 2016 at 11:38 AM, kiran <ki...@gmail.com> wrote:

> We are also facing the same issue. Please tell us what is the solution. I
> have increased the rpc timeout and caching is reduced but with no effect.
> We are using hbase 0.98.7. Please suggest a work around as we are facing
> the issue very frequently now and we are having downtime in production.
>
> On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:
>
>> There is a view child nodes loop code  http://paste2.org/zm8GE7xH
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "蒲聪-北京";<17...@qq.com>;
>> 发送时间: 2015年10月30日(星期五) 上午10:24
>> 收件人: "user"<us...@hbase.apache.org>;
>>
>> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>>
>>
>>
>>
>>
>> The client code is http://paste2.org/p3BXkKtV
>>
>>
>> Is the client version compatible with it?
>> I see  that the client version is hbase0.96.1.1-hadoop2
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "Ted Yu";<yu...@gmail.com>;
>> 发送时间: 2015年10月30日(星期五) 凌晨0:08
>> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>>
>> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>>
>>
>>
>> Client side, have they tried increasing direct memory size ?
>> -XX:MaxDirectMemorySize=
>>
>> Do you know how wide the rows returned may get ?
>>
>> Cheers
>>
>> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
>>
>> > Developers feedback their client has the following error:
>> >
>> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
>> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
>> > tomcat: unexpected exception receiving call responses
>> >
>> > java.lang.OutOfMemoryError: Direct buffer memory
>> >
>> >     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>> >
>> >     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
>> > ~[na:1.6.0_20]
>> >
>> >     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>> > ~[na:1.6.0_20]
>> >
>> >     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
>> > ~[na:1.6.0_20]
>> >
>> >     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>> >
>> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>> > ~[na:1.6.0_20]
>> >
>> >     at
>> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(
>> SocketInputStream.java:57)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> >     at
>> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithT
>> imeout.java:142)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> >     at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
>> am.java:161)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> >     at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
>> am.java:131)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
>> > ~[na:1.6.0_20]
>> >
>> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
>> > ~[na:1.6.0_20]
>> >
>> >     at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputSt
>> ream.read(RpcClient.java:555)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> >     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>> > ~[na:1.6.0_20]
>> >
>> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>> > ~[na:1.6.0_20]
>> >
>> >     at java.io.DataInputStream.read(DataInputStream.java:132)
>> > ~[na:1.6.0_20]
>> >
>> >     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> >     at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readRespons
>> e(RpcClient.java:1101)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> >     at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClie
>> nt.java:721)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ------------------ 原始邮件 ------------------
>> > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > 发送时间: 2015年10月29日(星期四) 晚上10:48
>> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>> >
>> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>> >
>> >
>> >
>> > I took a look at the jstack.
>> > The threads involving RpcServer$Connection.readAndProcess() were in
>> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
>> > HBASE-11277
>> > .
>> >
>> > The protobuf exception shown in your earlier email corresponded to the
>> > following in hbase-protocol/src/main/protobuf/Client.proto :
>> >
>> > message GetRequest {
>> >   required RegionSpecifier region = 1;
>> >   required Get get = 2;
>> > }
>> >
>> > Are all your hbase clients running in the same version ?
>> >
>> > Cheers
>> >
>> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>> >
>> > > the regionserver jstack log is    http://paste2.org/yLDJeXgL
>> > >
>> > >
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "蒲聪-北京";<17...@qq.com>;
>> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > > hi Ted:
>> > >
>> > >
>> > > Yesterday around 14:40,one of regionservers hang once against.At that
>> > time
>> > > I saw web ui can not open.Hbase cluster is  unable to respond.Very
>> > anxious,
>> > > hoping to get help!
>> > >
>> > >
>> > > jstack log is as follows:
>> > > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
>> > > nid=0x12d3 runnable [0x00007f3bebe58000]
>> > >    java.lang.Thread.State: RUNNABLE
>> > >     at sun.nio.ch.NativeThread.current(Native Method)
>> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>> > >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
>> > >     - locked <0x00007f3d27360f90> (a java.lang.Object)
>> > >     at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1476)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > >     - locked <0x00007f3c584ce990> (a
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > >     at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
>> > > nid=0x12d2 runnable [0x00007f3bebf59000]
>> > >    java.lang.Thread.State: RUNNABLE
>> > >     at sun.nio.ch.NativeThread.current(Native Method)
>> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>> > >     - locked <0x00007f3d27360530> (a java.lang.Object)
>> > >     - locked <0x00007f3d27360510> (a java.lang.Object)
>> > >     at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1476)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > >     - locked <0x00007f3c584cf7d8> (a
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > >     at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > region server log :
>> > > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion:
>> Finished
>> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
>> > >
>> > order_history,2801xyz140618175732642$3,1418829598639.afc8534
>> 71a8cd4184bc9e7be00b8eea0.
>> > > in 45ms, sequenceid=9599960557, compaction requested=true
>> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
>> > > regionserver.HRegionServer:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
>> > close_scanner:
>> > > false next_call_seq: 0
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
>> HRegionServer.java:3007)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
>> 46)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
>> java:1851)
>> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
>> > > regionserver.HRegionServer:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
>> > close_scanner:
>> > > false next_call_seq: 0
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
>> HRegionServer.java:3007)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
>> 46)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
>> java:1851)
>> > >
>> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
>> > > ipc.RpcServer: Unable to read call parameter from client
>> 192.168.37.135
>> > > com.google.protobuf.UninitializedMessageException: Message missing
>> > > required fields: region, get
>> > >     at
>> > >
>> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
>> MessageException(AbstractMessage.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4474)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4406)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
>> est(RpcServer.java:1689)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
>> pc(RpcServer.java:1631)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1491)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > >     at java.lang.Thread.run(Thread.java:744)
>> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
>> > > ipc.RpcServer: Unable to read call parameter from client
>> 192.168.37.135
>> > > com.google.protobuf.UninitializedMessageException: Message missing
>> > > required fields: region, get
>> > >     at
>> > >
>> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
>> MessageException(AbstractMessage.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4474)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4406)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
>> est(RpcServer.java:1689)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
>> pc(RpcServer.java:1631)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1491)
>> > >     at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > >     at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > >     at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > >     at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "蒲聪-北京";<17...@qq.com>;
>> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Thank you very much!
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > > The fix from HBASE-11277 may solve your problem - if you collect stack
>> > > trace during the hang, we would have more clue.
>> > >
>> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>> > >
>> > > Cheers
>> > >
>> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>> > > >
>> > > > hi,Ted:
>> > > >
>> > > >
>> > > > I use the HBase version is hbase-0.96.0.
>> > > > Around 17:33,other region servers also appeared in this warn log.I
>> > don't
>> > > know if it's normal or not.At that time I saw web ui can not open.I
>> > restart
>> > > the regionserver  then hbase back to normal. Is it possible  bug
>> > > HBASE-11277?
>> > > >
>> > > >
>> > > > Regionserver on the log basically almost  this warn log
>> > > > mater on the log  is as follows:
>> > > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
>> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
>> > > merged region(s) and 1 unreferenced parent region(s)
>> > > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
>> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
>> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
>> > > ipc.RpcServer: (responseTooSlow):
>> > >
>> > {"processingtimems":70266,"call":"ListTableDescriptorsByName
>> space(org.apache.hadoop.hbase.protobuf.generated.MasterProto
>> s$ListTableDescriptorsByNamespaceRequest)","client":"
>> > > 192.168.39.22:60292
>> > >
>> > ","starttimems":1445593715207,"queuetimems":0,"class":"HMast
>> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
>> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: (responseTooSlow):
>> > >
>> > {"processingtimems":130525,"call":"ListTableDescriptorsByNam
>> espace(org.apache.hadoop.hbase.protobuf.generated.Maste
>> rProtos$ListTableDescriptorsByNamespaceRequest)","client":"
>> > > 192.168.39.22:60286
>> > >
>> > ","starttimems":1445593654945,"queuetimems":0,"class":"HMast
>> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
>> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
>> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service:
>> MasterService
>> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
>> > > 192.168.39.22:60292: output error
>> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service:
>> MasterService
>> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
>> > > 192.168.39.22:60286: output error
>> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
>> > > ClosedChannelException, this means that the server was processing a
>> > request
>> > > but the client went away. The error message was: null
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > ------------------ 原始邮件 ------------------
>> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
>> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>> > > >
>> > > > 主题: Re: Hbase cluster is suddenly unable to respond
>> > > >
>> > > >
>> > > >
>> > > > Were other region servers functioning normally around 17:33 ?
>> > > >
>> > > > Which hbase release are you using ?
>> > > >
>> > > > Can you pastebin more of the region server log ?
>> > > >
>> > > > Thanks
>> > > >
>> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>> > > >>
>> > > >> hi,all:
>> > > >>
>> > > >>
>> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
>> > > respond.after
>> > > >> I restart a regionserver after,the cluster has recovered.I don't
>> know
>> > > the
>> > > >> cause of the trouble.I hope I can get help from you.
>> > > >>
>> > > >>
>> > > >> Regionserver on the log is as follows:
>> > > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
>> > wal.FSHLog:
>> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
>> > > %2C60020%2C1442810406218.1445580462689
>> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> > > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
>> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read:
>> 0
>> > > >> java.io.IOException: Connection reset by peer
>> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> > > >>        at sun.nio.ch.SocketDispatcher.re
>> ad(SocketDispatcher.java:39)
>> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> > > >>        at
>> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> > > >>        at
>> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1403)
>> > > >>        at
>> > > >>
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > >>        at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > >>        at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > >>        at java.lang.Thread.run(Thread.java:744)
>> > > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
>> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read:
>> 0
>> > > >> java.io.IOException: Connection reset by peer
>> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> > > >>        at sun.nio.ch.SocketDispatcher.re
>> ad(SocketDispatcher.java:39)
>> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> > > >>        at
>> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> > > >>        at
>> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1403)
>> > > >>        at
>> > > >>
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > >>        at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > >>        at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > >>        at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > >>        at java.lang.Thread.run(Thread.java:744)
>> > >
>> >
>>
>
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
>


-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

Re: 回复: Hbase cluster is suddenly unable to respond

Posted by kiran <ki...@gmail.com>.
We are also facing the same issue. Please tell us what is the solution. I
have increased the rpc timeout and caching is reduced but with no effect.
We are using hbase 0.98.7. Please suggest a work around as we are facing
the issue very frequently now and we are having downtime in production.

On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:

> There is a view child nodes loop code  http://paste2.org/zm8GE7xH
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月30日(星期五) 上午10:24
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> The client code is http://paste2.org/p3BXkKtV
>
>
> Is the client version compatible with it?
> I see  that the client version is hbase0.96.1.1-hadoop2
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月30日(星期五) 凌晨0:08
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> Client side, have they tried increasing direct memory size ?
> -XX:MaxDirectMemorySize=
>
> Do you know how wide the rows returned may get ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
>
> > Developers feedback their client has the following error:
> >
> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> > tomcat: unexpected exception receiving call responses
> >
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> >     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
> >
> >     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> > ~[na:1.6.0_20]
> >
> >     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> > ~[na:1.6.0_20]
> >
> >     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> > ~[na:1.6.0_20]
> >
> >     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
> >
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> > ~[na:1.6.0_20]
> >
> >     at
> > org.apache.hadoop.net.SocketInputStream$Reader.
> performIO(SocketInputStream.java:57)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> >     at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> SocketIOWithTimeout.java:142)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> >     at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> >     at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> > ~[na:1.6.0_20]
> >
> >     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> > ~[na:1.6.0_20]
> >
> >     at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(
> RpcClient.java:555)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> >     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> > ~[na:1.6.0_20]
> >
> >     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > ~[na:1.6.0_20]
> >
> >     at java.io.DataInputStream.read(DataInputStream.java:132)
> > ~[na:1.6.0_20]
> >
> >     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> >     at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.
> readResponse(RpcClient.java:1101)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> >     at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上10:48
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > I took a look at the jstack.
> > The threads involving RpcServer$Connection.readAndProcess() were in
> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> > HBASE-11277
> > .
> >
> > The protobuf exception shown in your earlier email corresponded to the
> > following in hbase-protocol/src/main/protobuf/Client.proto :
> >
> > message GetRequest {
> >   required RegionSpecifier region = 1;
> >   required Get get = 2;
> > }
> >
> > Are all your hbase clients running in the same version ?
> >
> > Cheers
> >
> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> >
> > > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > hi Ted:
> > >
> > >
> > > Yesterday around 14:40,one of regionservers hang once against.At that
> > time
> > > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> > anxious,
> > > hoping to get help!
> > >
> > >
> > > jstack log is as follows:
> > > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > > nid=0x12d3 runnable [0x00007f3bebe58000]
> > >    java.lang.Thread.State: RUNNABLE
> > >     at sun.nio.ch.NativeThread.current(Native Method)
> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> > >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> > >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1476)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > >     - locked <0x00007f3c584ce990> (a
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > > nid=0x12d2 runnable [0x00007f3bebf59000]
> > >    java.lang.Thread.State: RUNNABLE
> > >     at sun.nio.ch.NativeThread.current(Native Method)
> > >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > >     - locked <0x00007f3d27360530> (a java.lang.Object)
> > >     - locked <0x00007f3d27360510> (a java.lang.Object)
> > >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1476)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > >     - locked <0x00007f3c584cf7d8> (a
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > >
> > >
> > >
> > > region server log :
> > > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion:
> Finished
> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> > >
> > order_history,2801xyz140618175732642$3,1418829598639.
> afc853471a8cd4184bc9e7be00b8eea0.
> > > in 45ms, sequenceid=9599960557, compaction requested=true
> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > > regionserver.HRegionServer:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> > close_scanner:
> > > false next_call_seq: 0
> > >     at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> scan(HRegionServer.java:3007)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:26929)
> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > > regionserver.HRegionServer:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> > close_scanner:
> > > false next_call_seq: 0
> > >     at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> scan(HRegionServer.java:3007)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:26929)
> > >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > >
> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > > com.google.protobuf.UninitializedMessageException: Message missing
> > > required fields: region, get
> > >     at
> > >
> > com.google.protobuf.AbstractMessage$Builder.
> newUninitializedMessageException(AbstractMessage.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4474)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4406)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processRequest(RpcServer.java:1689)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processOneRpc(RpcServer.java:1631)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1491)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > >     at java.lang.Thread.run(Thread.java:744)
> > > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > > com.google.protobuf.UninitializedMessageException: Message missing
> > > required fields: region, get
> > >     at
> > >
> > com.google.protobuf.AbstractMessage$Builder.
> newUninitializedMessageException(AbstractMessage.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4474)
> > >     at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4406)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processRequest(RpcServer.java:1689)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processOneRpc(RpcServer.java:1631)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1491)
> > >     at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > >     at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > >     at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > >     at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > >
> > >
> > > Thank you very much!
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > The fix from HBASE-11277 may solve your problem - if you collect stack
> > > trace during the hang, we would have more clue.
> > >
> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> > >
> > > Cheers
> > >
> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > > >
> > > > hi,Ted:
> > > >
> > > >
> > > > I use the HBase version is hbase-0.96.0.
> > > > Around 17:33,other region servers also appeared in this warn log.I
> > don't
> > > know if it's normal or not.At that time I saw web ui can not open.I
> > restart
> > > the regionserver  then hbase back to normal. Is it possible  bug
> > > HBASE-11277?
> > > >
> > > >
> > > > Regionserver on the log basically almost  this warn log
> > > > mater on the log  is as follows:
> > > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > > merged region(s) and 1 unreferenced parent region(s)
> > > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > > ipc.RpcServer: (responseTooSlow):
> > >
> > {"processingtimems":70266,"call":"ListTableDescriptorsByNamespac
> e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
> ListTableDescriptorsByNamespaceRequest)","client":"
> > > 192.168.39.22:60292
> > >
> > ","starttimems":1445593715207,"queuetimems":0,"class":"
> HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: (responseTooSlow):
> > >
> > {"processingtimems":130525,"call":"ListTableDescriptorsByNamespac
> e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
> ListTableDescriptorsByNamespaceRequest)","client":"
> > > 192.168.39.22:60286
> > >
> > ","starttimems":1445593654945,"queuetimems":0,"class":"
> HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > > 192.168.39.22:60292: output error
> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > > 192.168.39.22:60286: output error
> > > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > > ClosedChannelException, this means that the server was processing a
> > request
> > > but the client went away. The error message was: null
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ------------------ 原始邮件 ------------------
> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > > >
> > > > 主题: Re: Hbase cluster is suddenly unable to respond
> > > >
> > > >
> > > >
> > > > Were other region servers functioning normally around 17:33 ?
> > > >
> > > > Which hbase release are you using ?
> > > >
> > > > Can you pastebin more of the region server log ?
> > > >
> > > > Thanks
> > > >
> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > > >>
> > > >> hi,all:
> > > >>
> > > >>
> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > > respond.after
> > > >> I restart a regionserver after,the cluster has recovered.I don't
> know
> > > the
> > > >> cause of the trouble.I hope I can get help from you.
> > > >>
> > > >>
> > > >> Regionserver on the log is as follows:
> > > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> > wal.FSHLog:
> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > > %2C60020%2C1442810406218.1445580462689
> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > > >> java.io.IOException: Connection reset by peer
> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > >>        at sun.nio.ch.SocketDispatcher.
> read(SocketDispatcher.java:39)
> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > > >>        at
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > > >>        at
> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
> RpcServer.java:2368)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1403)
> > > >>        at
> > > >>
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > >>        at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > >>        at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > >>        at java.lang.Thread.run(Thread.java:744)
> > > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > > >> java.io.IOException: Connection reset by peer
> > > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > >>        at sun.nio.ch.SocketDispatcher.
> read(SocketDispatcher.java:39)
> > > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > > >>        at
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > > >>        at
> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
> RpcServer.java:2368)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1403)
> > > >>        at
> > > >>
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > >>        at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > >>        at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > >>        at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > >>        at java.lang.Thread.run(Thread.java:744)
> > >
> >
>



-- 
Thank you
Kiran Sarvabhotla

-----Even a correct decision is wrong when it is taken late

回复: 回复: Hbase cluster is suddenly unable to respond

Posted by 聪聪 <17...@qq.com>.
There is a view child nodes loop code  http://paste2.org/zm8GE7xH




------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月30日(星期五) 上午10:24
收件人: "user"<us...@hbase.apache.org>; 

主题: 回复: 回复: Hbase cluster is suddenly unable to respond





The client code is http://paste2.org/p3BXkKtV


Is the client version compatible with it?
I see  that the client version is hbase0.96.1.1-hadoop2




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=

Do you know how wide the rows returned may get ?

Cheers

On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:

> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
>     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
>     at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
>   required RegionSpecifier region = 1;
>   required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver  then hbase back to normal. Is it possible  bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost  this warn log
> > > mater on the log  is as follows:
> > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> >
>

回复: 回复: Hbase cluster is suddenly unable to respond

Posted by 聪聪 <17...@qq.com>.
The client code is http://paste2.org/p3BXkKtV


Is the client version compatible with it?
I see  that the client version is hbase0.96.1.1-hadoop2




------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=

Do you know how wide the rows returned may get ?

Cheers

On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:

> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
>     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
>     at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
>   required RegionSpecifier region = 1;
>   required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver  then hbase back to normal. Is it possible  bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost  this warn log
> > > mater on the log  is as follows:
> > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> >
>

Re: 回复: Hbase cluster is suddenly unable to respond

Posted by Ted Yu <yu...@gmail.com>.
Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=

Do you know how wide the rows returned may get ?

Cheers

On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:

> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
>     at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
>     at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
>     at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
>     at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
>     at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
>     at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
>     at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>     at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
>   required RegionSpecifier region = 1;
>   required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is    http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is  unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >     - locked <0x00007f3d27360f90> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> >    java.lang.Thread.State: RUNNABLE
> >     at sun.nio.ch.NativeThread.current(Native Method)
> >     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> >     - locked <0x00007f3d27360530> (a java.lang.Object)
> >     - locked <0x00007f3d27360510> (a java.lang.Object)
> >     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> >     at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> >     at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> >     at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> >     at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >     at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >     at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >     at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver  then hbase back to normal. Is it possible  bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost  this warn log
> > > mater on the log  is as follows:
> > > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >>        at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >>        at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >>        at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >>        at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >>        at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >>        at java.lang.Thread.run(Thread.java:744)
> >
>

回复: 回复: Hbase cluster is suddenly unable to respond

Posted by 聪聪 <17...@qq.com>.
Developers feedback their client has the following error:
 
[2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969) connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from tomcat: unexpected exception receiving call responses
 
java.lang.OutOfMemoryError: Direct buffer memory
 
    at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
 
    at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95) ~[na:1.6.0_20]
 
    at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) ~[na:1.6.0_20]
 
    at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57) ~[na:1.6.0_20]
 
    at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
 
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) ~[na:1.6.0_20]
 
    at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) ~[hadoop-common-2.2.0.jar:na]
 
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) ~[hadoop-common-2.2.0.jar:na]
 
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) ~[hadoop-common-2.2.0.jar:na]
 
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) ~[hadoop-common-2.2.0.jar:na]
 
    at java.io.FilterInputStream.read(FilterInputStream.java:116) ~[na:1.6.0_20]
 
    at java.io.FilterInputStream.read(FilterInputStream.java:116) ~[na:1.6.0_20]
 
    at org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
 
    at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) ~[na:1.6.0_20]
 
    at java.io.BufferedInputStream.read(BufferedInputStream.java:317) ~[na:1.6.0_20]
 
    at java.io.DataInputStream.read(DataInputStream.java:132) ~[na:1.6.0_20]
 
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) ~[hadoop-common-2.2.0.jar:na]
 
    at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
 
    at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]







------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月29日(星期四) 晚上10:48
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in HBASE-11277
.

The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :

message GetRequest {
  required RegionSpecifier region = 1;
  required Get get = 2;
}

Are all your hbase clients running in the same version ?

Cheers

On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:

> the regionserver jstack log is    http://paste2.org/yLDJeXgL
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月29日(星期四) 晚上9:10
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> hi Ted:
>
>
> Yesterday around 14:40,one of regionservers hang once against.At that time
> I saw web ui can not open.Hbase cluster is  unable to respond.Very anxious,
> hoping to get help!
>
>
> jstack log is as follows:
> "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> nid=0x12d3 runnable [0x00007f3bebe58000]
>    java.lang.Thread.State: RUNNABLE
>     at sun.nio.ch.NativeThread.current(Native Method)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>     - locked <0x00007f3d27360fb0> (a java.lang.Object)
>     - locked <0x00007f3d27360f90> (a java.lang.Object)
>     at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     - locked <0x00007f3c584ce990> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
> "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> nid=0x12d2 runnable [0x00007f3bebf59000]
>    java.lang.Thread.State: RUNNABLE
>     at sun.nio.ch.NativeThread.current(Native Method)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>     - locked <0x00007f3d27360530> (a java.lang.Object)
>     - locked <0x00007f3d27360510> (a java.lang.Object)
>     at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     - locked <0x00007f3c584cf7d8> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
> region server log :
> 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> in 45ms, sequenceid=9599960557, compaction requested=true
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
>
> 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
>     at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
>     at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月26日(星期一) 晚上9:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> Thank you very much!
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月26日(星期一) 晚上8:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> The fix from HBASE-11277 may solve your problem - if you collect stack
> trace during the hang, we would have more clue.
>
> I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>
> Cheers
>
> > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >
> > hi,Ted:
> >
> >
> > I use the HBase version is hbase-0.96.0.
> > Around 17:33,other region servers also appeared in this warn log.I don't
> know if it's normal or not.At that time I saw web ui can not open.I restart
> the regionserver  then hbase back to normal. Is it possible  bug
> HBASE-11277?
> >
> >
> > Regionserver on the log basically almost  this warn log
> > mater on the log  is as follows:
> > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> merged region(s) and 1 unreferenced parent region(s)
> > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60292
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60286
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60292: output error
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60286: output error
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> ClosedChannelException, this means that the server was processing a request
> but the client went away. The error message was: null
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > Were other region servers functioning normally around 17:33 ?
> >
> > Which hbase release are you using ?
> >
> > Can you pastebin more of the region server log ?
> >
> > Thanks
> >
> >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> hi,all:
> >>
> >>
> >> This afternoon,The whole Hbase cluster is suddenly unable to
> respond.after
> >> I restart a regionserver after,the cluster has recovered.I don't know
> the
> >> cause of the trouble.I hope I can get help from you.
> >>
> >>
> >> Regionserver on the log is as follows:
> >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller] wal.FSHLog:
> >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> %2C60020%2C1442810406218.1445580462689
> >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:744)
> >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:744)
>

Re: 回复: Hbase cluster is suddenly unable to respond

Posted by Ted Yu <yu...@gmail.com>.
I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in HBASE-11277
.

The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :

message GetRequest {
  required RegionSpecifier region = 1;
  required Get get = 2;
}

Are all your hbase clients running in the same version ?

Cheers

On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:

> the regionserver jstack log is    http://paste2.org/yLDJeXgL
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月29日(星期四) 晚上9:10
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> hi Ted:
>
>
> Yesterday around 14:40,one of regionservers hang once against.At that time
> I saw web ui can not open.Hbase cluster is  unable to respond.Very anxious,
> hoping to get help!
>
>
> jstack log is as follows:
> "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> nid=0x12d3 runnable [0x00007f3bebe58000]
>    java.lang.Thread.State: RUNNABLE
>     at sun.nio.ch.NativeThread.current(Native Method)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>     - locked <0x00007f3d27360fb0> (a java.lang.Object)
>     - locked <0x00007f3d27360f90> (a java.lang.Object)
>     at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     - locked <0x00007f3c584ce990> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
> "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> nid=0x12d2 runnable [0x00007f3bebf59000]
>    java.lang.Thread.State: RUNNABLE
>     at sun.nio.ch.NativeThread.current(Native Method)
>     at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>     - locked <0x00007f3d27360530> (a java.lang.Object)
>     - locked <0x00007f3d27360510> (a java.lang.Object)
>     at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     - locked <0x00007f3c584cf7d8> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
> region server log :
> 2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished
> memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> in 45ms, sequenceid=9599960557, compaction requested=true
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
>     at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>     at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
>
> 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
>     at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
> 2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
>     at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
>     at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>     at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>     at java.lang.Thread.run(Thread.java:744)
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月26日(星期一) 晚上9:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> Thank you very much!
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月26日(星期一) 晚上8:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> The fix from HBASE-11277 may solve your problem - if you collect stack
> trace during the hang, we would have more clue.
>
> I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>
> Cheers
>
> > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >
> > hi,Ted:
> >
> >
> > I use the HBase version is hbase-0.96.0.
> > Around 17:33,other region servers also appeared in this warn log.I don't
> know if it's normal or not.At that time I saw web ui can not open.I restart
> the regionserver  then hbase back to normal. Is it possible  bug
> HBASE-11277?
> >
> >
> > Regionserver on the log basically almost  this warn log
> > mater on the log  is as follows:
> > 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000]
> master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> merged region(s) and 1 unreferenced parent region(s)
> > 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000]
> master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60292
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60286
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60292: output error
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60286: output error
> > 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> ClosedChannelException, this means that the server was processing a request
> but the client went away. The error message was: null
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > Were other region servers functioning normally around 17:33 ?
> >
> > Which hbase release are you using ?
> >
> > Can you pastebin more of the region server log ?
> >
> > Thanks
> >
> >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> hi,all:
> >>
> >>
> >> This afternoon,The whole Hbase cluster is suddenly unable to
> respond.after
> >> I restart a regionserver after,the cluster has recovered.I don't know
> the
> >> cause of the trouble.I hope I can get help from you.
> >>
> >>
> >> Regionserver on the log is as follows:
> >> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller] wal.FSHLog:
> >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> %2C60020%2C1442810406218.1445580462689
> >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> >> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:744)
> >> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >>        at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >>        at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >>        at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >>        at java.lang.Thread.run(Thread.java:744)
>

回复: 回复: Hbase cluster is suddenly unable to respond

Posted by 聪聪 <17...@qq.com>.
the regionserver jstack log is    http://paste2.org/yLDJeXgL




------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月29日(星期四) 晚上9:10
收件人: "user"<us...@hbase.apache.org>; 

主题: 回复: 回复: Hbase cluster is suddenly unable to respond



hi Ted:


Yesterday around 14:40,one of regionservers hang once against.At that time I saw web ui can not open.Hbase cluster is  unable to respond.Very anxious, hoping to get help!


jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800 nid=0x12d3 runnable [0x00007f3bebe58000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.NativeThread.current(Native Method)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
    - locked <0x00007f3d27360fb0> (a java.lang.Object)
    - locked <0x00007f3d27360f90> (a java.lang.Object)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    - locked <0x00007f3c584ce990> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)


"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000 nid=0x12d2 runnable [0x00007f3bebf59000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.NativeThread.current(Native Method)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
    - locked <0x00007f3d27360530> (a java.lang.Object)
    - locked <0x00007f3d27360510> (a java.lang.Object)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    - locked <0x00007f3c584cf7d8> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)





region server log :
2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished memstore flush of ~3.6 M/3820648, currentsize=536/536 for region order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0. in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)

2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
    at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
    at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)



------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>; 

主题: 回复: 回复: Hbase cluster is suddenly unable to respond





Thank you very much!


------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue. 

I suggest upgrading to newer release such as 1.1.2 or 0.98.15

Cheers

> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> 
> hi,Ted:
> 
> 
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver  then hbase back to normal. Is it possible  bug  HBASE-11277?
> 
> 
> Regionserver on the log basically almost  this warn log
> mater on the log  is as follows:
> 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 
> 
> 主题: Re: Hbase cluster is suddenly unable to respond
> 
> 
> 
> Were other region servers functioning normally around 17:33 ?
> 
> Which hbase release are you using ?
> 
> Can you pastebin more of the region server log ?
> 
> Thanks
> 
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>> 
>> hi,all:
>> 
>> 
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>> 
>> 
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)

回复: 回复: Hbase cluster is suddenly unable to respond

Posted by 聪聪 <17...@qq.com>.
hi Ted:


Yesterday around 14:40,one of regionservers hang once against.At that time I saw web ui can not open.Hbase cluster is  unable to respond.Very anxious, hoping to get help!


jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800 nid=0x12d3 runnable [0x00007f3bebe58000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.NativeThread.current(Native Method)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
    - locked <0x00007f3d27360fb0> (a java.lang.Object)
    - locked <0x00007f3d27360f90> (a java.lang.Object)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    - locked <0x00007f3c584ce990> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)


"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000 nid=0x12d2 runnable [0x00007f3bebf59000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.NativeThread.current(Native Method)
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
    - locked <0x00007f3d27360530> (a java.lang.Object)
    - locked <0x00007f3d27360510> (a java.lang.Object)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
    at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    - locked <0x00007f3c584cf7d8> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)





region server log :
2015-10-28 14:38:19,801 INFO  [Thread-15] regionserver.HRegion: Finished memstore flush of ~3.6 M/3820648, currentsize=536/536 for region order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0. in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
    at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
    at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
    at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)

2015-10-28 14:38:32,696 WARN  [RpcServer.reader=2,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
    at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN  [RpcServer.reader=1,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
    at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
    at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
    at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
    at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)



------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>; 

主题: 回复: 回复: Hbase cluster is suddenly unable to respond





Thank you very much!


------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>; 

主题: Re: 回复: Hbase cluster is suddenly unable to respond



The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue. 

I suggest upgrading to newer release such as 1.1.2 or 0.98.15

Cheers

> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> 
> hi,Ted:
> 
> 
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver  then hbase back to normal. Is it possible  bug  HBASE-11277?
> 
> 
> Regionserver on the log basically almost  this warn log
> mater on the log  is as follows:
> 2015-10-21 22:15:43,575 INFO  [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO  [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN  [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>; 
> 
> 主题: Re: Hbase cluster is suddenly unable to respond
> 
> 
> 
> Were other region servers functioning normally around 17:33 ?
> 
> Which hbase release are you using ?
> 
> Can you pastebin more of the region server log ?
> 
> Thanks
> 
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>> 
>> hi,all:
>> 
>> 
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>> 
>> 
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO  [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN  [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN  [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>>        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>>        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>        at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>>        at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>>        at java.lang.Thread.run(Thread.java:744)