You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by 聪聪 <17...@qq.com> on 2015/10/23 17:28:10 UTC
Hbase cluster is suddenly unable to respond
hi,all:
This afternoon,The whole Hbase cluster is suddenly unable to respond.after I restart a regionserver after,the cluster has recovered.I don't know the cause of the trouble.I hope I can get help from you.
Regionserver on the log is as follows:
2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog: moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689 whose highest sequenceid is 9071525521 to /hbase/oldWALs/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020] ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Re: Hbase cluster is suddenly unable to respond
Posted by 吴国泉wgq <wg...@qunar.com>.
hi all:
There are two conditions caused the region server crashed (we met).
1.NIO, out of direct memory
2.zookeeper session timeout
You can find the reason in the region server log (or .out) or GC log.
If it is the "out of direct memory",you will see “ kill -9 XXX” in the regionserverXX.out .
Then change the “-XX:+DisableExplicitGC” to "-XX:+ExplicitGCInvokesConcurrent”
If it is the “zookeeper session timeout”, you can see “XX seconds timeout to zookeeper” in the regionserverXX.log.
Here is the point: The ‘zookeeper.session.timeout’ in hbase-site.xml does not work if you use a zookeeper is not managed by hbase.
“maxSessionTimeout” in the zoo.cfg is the really property controls the timeout. default it is 40s. That may be the reason why ‘zookeeper.session.timeout’ does not work.
GC can stop the world,You can optical the GC or change ‘maxSessionTimeout’ bigger to make sure Hbase won’t shutdown the region server after an acceptable timeout.
在 2016年8月6日,下午2:08,kiran <ki...@gmail.com>> 写道:
We are also facing the same issue. Please tell us what is the solution. I
have increased the rpc timeout and caching is reduced but with no effect.
We are using hbase 0.98.7. Please suggest a work around as we are facing
the issue very frequently now and we are having downtime in production.
On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com>> wrote:
There is a view child nodes loop code http://paste2.org/zm8GE7xH
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月30日(星期五) 上午10:24
收件人: "user"<us...@hbase.apache.org>>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
The client code is http://paste2.org/p3BXkKtV
Is the client version compatible with it?
I see that the client version is hbase0.96.1.1-hadoop2
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=
Do you know how wide the rows returned may get ?
Cheers
On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com>> wrote:
Developers feedback their client has the following error:
[2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020<http://l-hbase28.data.cn8.qunar.com/192.168.44.32:60020> from
tomcat: unexpected exception receiving call responses
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
~[na:1.6.0_20]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
~[na:1.6.0_20]
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
~[na:1.6.0_20]
at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
~[na:1.6.0_20]
at
org.apache.hadoop.net.SocketInputStream$Reader.
performIO(SocketInputStream.java:57)
~[hadoop-common-2.2.0.jar:na]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(
SocketIOWithTimeout.java:142)
~[hadoop-common-2.2.0.jar:na]
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
~[hadoop-common-2.2.0.jar:na]
at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
~[hadoop-common-2.2.0.jar:na]
at java.io.FilterInputStream.read(FilterInputStream.java:116)
~[na:1.6.0_20]
at java.io.FilterInputStream.read(FilterInputStream.java:116)
~[na:1.6.0_20]
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(
RpcClient.java:555)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
~[na:1.6.0_20]
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
~[na:1.6.0_20]
at java.io.DataInputStream.read(DataInputStream.java:132)
~[na:1.6.0_20]
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
~[hadoop-common-2.2.0.jar:na]
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.
readResponse(RpcClient.java:1101)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月29日(星期四) 晚上10:48
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
HBASE-11277
.
The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :
message GetRequest {
required RegionSpecifier region = 1;
required Get get = 2;
}
Are all your hbase clients running in the same version ?
Cheers
On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com>> wrote:
the regionserver jstack log is http://paste2.org/yLDJeXgL
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月29日(星期四) 晚上9:10
收件人: "user"<us...@hbase.apache.org>>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
hi Ted:
Yesterday around 14:40,one of regionservers hang once against.At that
time
I saw web ui can not open.Hbase cluster is unable to respond.Very
anxious,
hoping to get help!
jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
nid=0x12d3 runnable [0x00007f3bebe58000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360fb0> (a java.lang.Object)
- locked <0x00007f3d27360f90> (a java.lang.Object)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1476)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584ce990> (a
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
nid=0x12d2 runnable [0x00007f3bebf59000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360530> (a java.lang.Object)
- locked <0x00007f3d27360510> (a java.lang.Object)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1476)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584cf7d8> (a
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
region server log :
2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion:
Finished
memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
order_history,2801xyz140618175732642$3,1418829598639.
afc853471a8cd4184bc9e7be00b8eea0.
in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 16740356019163164014 number_of_rows: 10
close_scanner:
false next_call_seq: 0
at
org.apache.hadoop.hbase.regionserver.HRegionServer.
scan(HRegionServer.java:3007)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 16740356019163164014 number_of_rows: 10
close_scanner:
false next_call_seq: 0
at
org.apache.hadoop.hbase.regionserver.HRegionServer.
scan(HRegionServer.java:3007)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at
org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing
required fields: region, get
at
com.google.protobuf.AbstractMessage$Builder.
newUninitializedMessageException(AbstractMessage.java:770)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4474)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4406)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processRequest(RpcServer.java:1689)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processOneRpc(RpcServer.java:1631)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1491)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing
required fields: region, get
at
com.google.protobuf.AbstractMessage$Builder.
newUninitializedMessageException(AbstractMessage.java:770)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4474)
at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
Builder.build(ClientProtos.java:4406)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processRequest(RpcServer.java:1689)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
processOneRpc(RpcServer.java:1631)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1491)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
Thank you very much!
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
The fix from HBASE-11277 may solve your problem - if you collect stack
trace during the hang, we would have more clue.
I suggest upgrading to newer release such as 1.1.2 or 0.98.15
Cheers
On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com>> wrote:
hi,Ted:
I use the HBase version is hbase-0.96.0.
Around 17:33,other region servers also appeared in this warn log.I
don't
know if it's normal or not.At that time I saw web ui can not open.I
restart
the regionserver then hbase back to normal. Is it possible bug
HBASE-11277?
Regionserver on the log basically almost this warn log
mater on the log is as follows:
2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
merged region(s) and 1 unreferenced parent region(s)
2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
ipc.RpcServer: (responseTooSlow):
{"processingtimems":70266,"call":"ListTableDescriptorsByNamespac
e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
ListTableDescriptorsByNamespaceRequest)","client":"
192.168.39.22:60292
","starttimems":1445593715207,"queuetimems":0,"class":"
HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
ipc.RpcServer: (responseTooSlow):
{"processingtimems":130525,"call":"ListTableDescriptorsByNamespac
e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
ListTableDescriptorsByNamespaceRequest)","client":"
192.168.39.22:60286
","starttimems":1445593654945,"queuetimems":0,"class":"
HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
methodName: ListTableDescriptorsByNamespace size: 48 connection:
192.168.39.22:60292: output error
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
methodName: ListTableDescriptorsByNamespace size: 48 connection:
192.168.39.22:60286: output error
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
ClosedChannelException, this means that the server was processing a
request
but the client went away. The error message was: null
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>>;
发送时间: 2015年10月23日(星期五) 晚上11:39
收件人: "user@hbase.apache.org<ma...@hbase.apache.org>>;
主题: Re: Hbase cluster is suddenly unable to respond
Were other region servers functioning normally around 17:33 ?
Which hbase release are you using ?
Can you pastebin more of the region server log ?
Thanks
On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com>> wrote:
hi,all:
This afternoon,The whole Hbase cluster is suddenly unable to
respond.after
I restart a regionserver after,the cluster has recovered.I don't
know
the
cause of the trouble.I hope I can get help from you.
Regionserver on the log is as follows:
2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
wal.FSHLog:
moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>
,60020,1442810406218/l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>
%2C60020%2C1442810406218.1445580462689
whose highest sequenceid is 9071525521 to /hbase/oldWALs/
l-hbase30.data.cn8.qunar.com<http://l-hbase30.data.cn8.qunar.com>%2C60020%2C1442810406218.1445580462689
2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.
read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
RpcServer.java:2368)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1403)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.
read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:197)
at
sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
RpcServer.java:2368)
at
org.apache.hadoop.hbase.ipc.RpcServer$Connection.
readAndProcess(RpcServer.java:1403)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
RpcServer.java:770)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
doRunLoop(RpcServer.java:563)
at
org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
RpcServer.java:538)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
--
Thank you
Kiran Sarvabhotla
-----Even a correct decision is wrong when it is taken late
安全提示:本邮件非QUNAR内部邮件,请注意保护个人及公司信息安全,如有索取帐号密码等可疑情况请向 secteam发送邮件
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by Ted Yu <yu...@gmail.com>.
Can you provide jstack of region server(s) ?
Was there anything interesting in the logs ?
Thanks
BTW 0.98.7 is quite old. Please consider upgrading.
On Fri, Aug 5, 2016 at 11:10 PM, kiran <ki...@gmail.com> wrote:
> Hbase client and server are in same version 0.98.7. We are having complete
> downtime of about 30min and high cpu usage in the node and network in the
> cluster.
>
> On Sat, Aug 6, 2016 at 11:38 AM, kiran <ki...@gmail.com>
> wrote:
>
> > We are also facing the same issue. Please tell us what is the solution. I
> > have increased the rpc timeout and caching is reduced but with no effect.
> > We are using hbase 0.98.7. Please suggest a work around as we are facing
> > the issue very frequently now and we are having downtime in production.
> >
> > On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:
> >
> >> There is a view child nodes loop code http://paste2.org/zm8GE7xH
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "蒲聪-北京";<17...@qq.com>;
> >> 发送时间: 2015年10月30日(星期五) 上午10:24
> >> 收件人: "user"<us...@hbase.apache.org>;
> >>
> >> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >>
> >>
> >> The client code is http://paste2.org/p3BXkKtV
> >>
> >>
> >> Is the client version compatible with it?
> >> I see that the client version is hbase0.96.1.1-hadoop2
> >>
> >>
> >>
> >>
> >> ------------------ 原始邮件 ------------------
> >> 发件人: "Ted Yu";<yu...@gmail.com>;
> >> 发送时间: 2015年10月30日(星期五) 凌晨0:08
> >> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >>
> >> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >>
> >>
> >>
> >> Client side, have they tried increasing direct memory size ?
> >> -XX:MaxDirectMemorySize=
> >>
> >> Do you know how wide the rows returned may get ?
> >>
> >> Cheers
> >>
> >> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> > Developers feedback their client has the following error:
> >> >
> >> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client
> (1904394969)
> >> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> >> > tomcat: unexpected exception receiving call responses
> >> >
> >> > java.lang.OutOfMemoryError: Direct buffer memory
> >> >
> >> > at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
> >> >
> >> > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
> >> >
> >> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at
> >> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(
> >> SocketInputStream.java:57)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> > at
> >> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithT
> >> imeout.java:142)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> > at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:161)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> > at
> >> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
> >> am.java:131)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputSt
> >> ream.read(RpcClient.java:555)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> > at java.io.BufferedInputStream.read1(BufferedInputStream.
> java:256)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at java.io.DataInputStream.read(DataInputStream.java:132)
> >> > ~[na:1.6.0_20]
> >> >
> >> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> >> > ~[hadoop-common-2.2.0.jar:na]
> >> >
> >> > at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readRespons
> >> e(RpcClient.java:1101)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> > at
> >> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClie
> >> nt.java:721)
> >> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ------------------ 原始邮件 ------------------
> >> > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > 发送时间: 2015年10月29日(星期四) 晚上10:48
> >> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >> >
> >> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >> >
> >> >
> >> >
> >> > I took a look at the jstack.
> >> > The threads involving RpcServer$Connection.readAndProcess() were in
> >> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> >> > HBASE-11277
> >> > .
> >> >
> >> > The protobuf exception shown in your earlier email corresponded to the
> >> > following in hbase-protocol/src/main/protobuf/Client.proto :
> >> >
> >> > message GetRequest {
> >> > required RegionSpecifier region = 1;
> >> > required Get get = 2;
> >> > }
> >> >
> >> > Are all your hbase clients running in the same version ?
> >> >
> >> > Cheers
> >> >
> >> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> >> >
> >> > > the regionserver jstack log is http://paste2.org/yLDJeXgL
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> >> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > hi Ted:
> >> > >
> >> > >
> >> > > Yesterday around 14:40,one of regionservers hang once against.At
> that
> >> > time
> >> > > I saw web ui can not open.Hbase cluster is unable to respond.Very
> >> > anxious,
> >> > > hoping to get help!
> >> > >
> >> > >
> >> > > jstack log is as follows:
> >> > > "RpcServer.reader=4,port=60020" daemon prio=10
> tid=0x00007f4466146800
> >> > > nid=0x12d3 runnable [0x00007f3bebe58000]
> >> > > java.lang.Thread.State: RUNNABLE
> >> > > at sun.nio.ch.NativeThread.current(Native Method)
> >> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > > - locked <0x00007f3d27360fb0> (a java.lang.Object)
> >> > > - locked <0x00007f3d27360f90> (a java.lang.Object)
> >> > > at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > - locked <0x00007f3c584ce990> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > > "RpcServer.reader=3,port=60020" daemon prio=10
> tid=0x00007f4466145000
> >> > > nid=0x12d2 runnable [0x00007f3bebf59000]
> >> > > java.lang.Thread.State: RUNNABLE
> >> > > at sun.nio.ch.NativeThread.current(Native Method)
> >> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:
> 325)
> >> > > - locked <0x00007f3d27360530> (a java.lang.Object)
> >> > > - locked <0x00007f3d27360510> (a java.lang.Object)
> >> > > at
> >> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1476)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > - locked <0x00007f3c584cf7d8> (a
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > region server log :
> >> > > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion:
> >> Finished
> >> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >> > >
> >> > order_history,2801xyz140618175732642$3,1418829598639.afc8534
> >> 71a8cd4184bc9e7be00b8eea0.
> >> > > in 45ms, sequenceid=9599960557, compaction requested=true
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> >> > > regionserver.HRegionServer:
> >> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> >> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> >> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> >> > close_scanner:
> >> > > false next_call_seq: 0
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
> >> HRegionServer.java:3007)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
> >> 46)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
> >> java:1851)
> >> > >
> >> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > > at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > at java.lang.Thread.run(Thread.java:744)
> >> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> >> > > ipc.RpcServer: Unable to read call parameter from client
> >> 192.168.37.135
> >> > > com.google.protobuf.UninitializedMessageException: Message missing
> >> > > required fields: region, get
> >> > > at
> >> > >
> >> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
> >> MessageException(AbstractMessage.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4474)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
> >> GetRequest$Builder.build(ClientProtos.java:4406)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
> >> est(RpcServer.java:1689)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
> >> pc(RpcServer.java:1631)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1491)
> >> > > at
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > at
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > at
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > Thank you very much!
> >> > >
> >> > >
> >> > > ------------------ 原始邮件 ------------------
> >> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
> >> > > 收件人: "user"<us...@hbase.apache.org>;
> >> > >
> >> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >> > >
> >> > >
> >> > >
> >> > > The fix from HBASE-11277 may solve your problem - if you collect
> stack
> >> > > trace during the hang, we would have more clue.
> >> > >
> >> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >> > >
> >> > > Cheers
> >> > >
> >> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >> > > >
> >> > > > hi,Ted:
> >> > > >
> >> > > >
> >> > > > I use the HBase version is hbase-0.96.0.
> >> > > > Around 17:33,other region servers also appeared in this warn log.I
> >> > don't
> >> > > know if it's normal or not.At that time I saw web ui can not open.I
> >> > restart
> >> > > the regionserver then hbase back to normal. Is it possible bug
> >> > > HBASE-11277?
> >> > > >
> >> > > >
> >> > > > Regionserver on the log basically almost this warn log
> >> > > > mater on the log is as follows:
> >> > > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> >> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0
> unreferenced
> >> > > merged region(s) and 1 unreferenced parent region(s)
> >> > > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> >> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> >> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":70266,"call":"ListTableDescriptorsByName
> >> space(org.apache.hadoop.hbase.protobuf.generated.MasterProto
> >> s$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60292
> >> > >
> >> > ","starttimems":1445593715207,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: (responseTooSlow):
> >> > >
> >> > {"processingtimems":130525,"call":"ListTableDescriptorsByNam
> >> espace(org.apache.hadoop.hbase.protobuf.generated.Maste
> >> rProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> >> > > 192.168.39.22:60286
> >> > >
> >> > ","starttimems":1445593654945,"queuetimems":0,"class":"HMast
> >> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> >> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60292: output error
> >> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service:
> >> MasterService
> >> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> >> > > 192.168.39.22:60286: output error
> >> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> >> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> >> > > ClosedChannelException, this means that the server was processing a
> >> > request
> >> > > but the client went away. The error message was: null
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > ------------------ 原始邮件 ------------------
> >> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
> >> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> >> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >> > > >
> >> > > > 主题: Re: Hbase cluster is suddenly unable to respond
> >> > > >
> >> > > >
> >> > > >
> >> > > > Were other region servers functioning normally around 17:33 ?
> >> > > >
> >> > > > Which hbase release are you using ?
> >> > > >
> >> > > > Can you pastebin more of the region server log ?
> >> > > >
> >> > > > Thanks
> >> > > >
> >> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >> > > >>
> >> > > >> hi,all:
> >> > > >>
> >> > > >>
> >> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
> >> > > respond.after
> >> > > >> I restart a regionserver after,the cluster has recovered.I don't
> >> know
> >> > > the
> >> > > >> cause of the trouble.I hope I can get help from you.
> >> > > >>
> >> > > >>
> >> > > >> Regionserver on the log is as follows:
> >> > > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
> >> > wal.FSHLog:
> >> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> >> > > %2C60020%2C1442810406218.1445580462689
> >> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.
> 1445580462689
> >> > > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >> at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >> at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >> at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >> at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >> at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >> at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >> at java.lang.Thread.run(Thread.java:744)
> >> > > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> >> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes
> read:
> >> 0
> >> > > >> java.io.IOException: Connection reset by peer
> >> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> > > >> at sun.nio.ch.SocketDispatcher.re
> >> ad(SocketDispatcher.java:39)
> >> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.
> java:223)
> >> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> > > >> at
> >> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> > > >> at
> >> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
> >> java:2368)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
> >> ess(RpcServer.java:1403)
> >> > > >> at
> >> > > >>
> >> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
> >> ver.java:770)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
> >> oop(RpcServer.java:563)
> >> > > >> at
> >> > > >>
> >> > >
> >> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
> >> cServer.java:538)
> >> > > >> at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
> >> Executor.java:1145)
> >> > > >> at
> >> > > >>
> >> > >
> >> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
> >> lExecutor.java:615)
> >> > > >> at java.lang.Thread.run(Thread.java:744)
> >> > >
> >> >
> >>
> >
> >
> >
> > --
> > Thank you
> > Kiran Sarvabhotla
> >
> > -----Even a correct decision is wrong when it is taken late
> >
> >
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by kiran <ki...@gmail.com>.
Hbase client and server are in same version 0.98.7. We are having complete
downtime of about 30min and high cpu usage in the node and network in the
cluster.
On Sat, Aug 6, 2016 at 11:38 AM, kiran <ki...@gmail.com> wrote:
> We are also facing the same issue. Please tell us what is the solution. I
> have increased the rpc timeout and caching is reduced but with no effect.
> We are using hbase 0.98.7. Please suggest a work around as we are facing
> the issue very frequently now and we are having downtime in production.
>
> On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:
>
>> There is a view child nodes loop code http://paste2.org/zm8GE7xH
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "蒲聪-北京";<17...@qq.com>;
>> 发送时间: 2015年10月30日(星期五) 上午10:24
>> 收件人: "user"<us...@hbase.apache.org>;
>>
>> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>>
>>
>>
>>
>>
>> The client code is http://paste2.org/p3BXkKtV
>>
>>
>> Is the client version compatible with it?
>> I see that the client version is hbase0.96.1.1-hadoop2
>>
>>
>>
>>
>> ------------------ 原始邮件 ------------------
>> 发件人: "Ted Yu";<yu...@gmail.com>;
>> 发送时间: 2015年10月30日(星期五) 凌晨0:08
>> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>>
>> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>>
>>
>>
>> Client side, have they tried increasing direct memory size ?
>> -XX:MaxDirectMemorySize=
>>
>> Do you know how wide the rows returned may get ?
>>
>> Cheers
>>
>> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
>>
>> > Developers feedback their client has the following error:
>> >
>> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
>> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
>> > tomcat: unexpected exception receiving call responses
>> >
>> > java.lang.OutOfMemoryError: Direct buffer memory
>> >
>> > at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>> >
>> > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
>> > ~[na:1.6.0_20]
>> >
>> > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
>> > ~[na:1.6.0_20]
>> >
>> > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
>> > ~[na:1.6.0_20]
>> >
>> > at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>> >
>> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
>> > ~[na:1.6.0_20]
>> >
>> > at
>> > org.apache.hadoop.net.SocketInputStream$Reader.performIO(
>> SocketInputStream.java:57)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> > at
>> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithT
>> imeout.java:142)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> > at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
>> am.java:161)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> > at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStre
>> am.java:131)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
>> > ~[na:1.6.0_20]
>> >
>> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
>> > ~[na:1.6.0_20]
>> >
>> > at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputSt
>> ream.read(RpcClient.java:555)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>> > ~[na:1.6.0_20]
>> >
>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>> > ~[na:1.6.0_20]
>> >
>> > at java.io.DataInputStream.read(DataInputStream.java:132)
>> > ~[na:1.6.0_20]
>> >
>> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
>> > ~[hadoop-common-2.2.0.jar:na]
>> >
>> > at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.readRespons
>> e(RpcClient.java:1101)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> > at
>> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClie
>> nt.java:721)
>> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > ------------------ 原始邮件 ------------------
>> > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > 发送时间: 2015年10月29日(星期四) 晚上10:48
>> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>> >
>> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>> >
>> >
>> >
>> > I took a look at the jstack.
>> > The threads involving RpcServer$Connection.readAndProcess() were in
>> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
>> > HBASE-11277
>> > .
>> >
>> > The protobuf exception shown in your earlier email corresponded to the
>> > following in hbase-protocol/src/main/protobuf/Client.proto :
>> >
>> > message GetRequest {
>> > required RegionSpecifier region = 1;
>> > required Get get = 2;
>> > }
>> >
>> > Are all your hbase clients running in the same version ?
>> >
>> > Cheers
>> >
>> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>> >
>> > > the regionserver jstack log is http://paste2.org/yLDJeXgL
>> > >
>> > >
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "蒲聪-北京";<17...@qq.com>;
>> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > > hi Ted:
>> > >
>> > >
>> > > Yesterday around 14:40,one of regionservers hang once against.At that
>> > time
>> > > I saw web ui can not open.Hbase cluster is unable to respond.Very
>> > anxious,
>> > > hoping to get help!
>> > >
>> > >
>> > > jstack log is as follows:
>> > > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
>> > > nid=0x12d3 runnable [0x00007f3bebe58000]
>> > > java.lang.Thread.State: RUNNABLE
>> > > at sun.nio.ch.NativeThread.current(Native Method)
>> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>> > > - locked <0x00007f3d27360fb0> (a java.lang.Object)
>> > > - locked <0x00007f3d27360f90> (a java.lang.Object)
>> > > at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1476)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > - locked <0x00007f3c584ce990> (a
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
>> > > nid=0x12d2 runnable [0x00007f3bebf59000]
>> > > java.lang.Thread.State: RUNNABLE
>> > > at sun.nio.ch.NativeThread.current(Native Method)
>> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
>> > > - locked <0x00007f3d27360530> (a java.lang.Object)
>> > > - locked <0x00007f3d27360510> (a java.lang.Object)
>> > > at
>> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1476)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > - locked <0x00007f3c584cf7d8> (a
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > region server log :
>> > > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion:
>> Finished
>> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
>> > >
>> > order_history,2801xyz140618175732642$3,1418829598639.afc8534
>> 71a8cd4184bc9e7be00b8eea0.
>> > > in 45ms, sequenceid=9599960557, compaction requested=true
>> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
>> > > regionserver.HRegionServer:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
>> > close_scanner:
>> > > false next_call_seq: 0
>> > > at
>> > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
>> HRegionServer.java:3007)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
>> 46)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
>> java:1851)
>> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
>> > > regionserver.HRegionServer:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
>> > close_scanner:
>> > > false next_call_seq: 0
>> > > at
>> > >
>> > org.apache.hadoop.hbase.regionserver.HRegionServer.scan(
>> HRegionServer.java:3007)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:21
>> 46)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.
>> java:1851)
>> > >
>> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
>> > > ipc.RpcServer: Unable to read call parameter from client
>> 192.168.37.135
>> > > com.google.protobuf.UninitializedMessageException: Message missing
>> > > required fields: region, get
>> > > at
>> > >
>> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
>> MessageException(AbstractMessage.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4474)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4406)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
>> est(RpcServer.java:1689)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
>> pc(RpcServer.java:1631)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1491)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
>> > > ipc.RpcServer: Unable to read call parameter from client
>> 192.168.37.135
>> > > com.google.protobuf.UninitializedMessageException: Message missing
>> > > required fields: region, get
>> > > at
>> > >
>> > com.google.protobuf.AbstractMessage$Builder.newUninitialized
>> MessageException(AbstractMessage.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4474)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$
>> GetRequest$Builder.build(ClientProtos.java:4406)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequ
>> est(RpcServer.java:1689)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneR
>> pc(RpcServer.java:1631)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1491)
>> > > at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > at
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > at
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > at java.lang.Thread.run(Thread.java:744)
>> > >
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "蒲聪-北京";<17...@qq.com>;
>> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > Thank you very much!
>> > >
>> > >
>> > > ------------------ 原始邮件 ------------------
>> > > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
>> > > 收件人: "user"<us...@hbase.apache.org>;
>> > >
>> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>> > >
>> > >
>> > >
>> > > The fix from HBASE-11277 may solve your problem - if you collect stack
>> > > trace during the hang, we would have more clue.
>> > >
>> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>> > >
>> > > Cheers
>> > >
>> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>> > > >
>> > > > hi,Ted:
>> > > >
>> > > >
>> > > > I use the HBase version is hbase-0.96.0.
>> > > > Around 17:33,other region servers also appeared in this warn log.I
>> > don't
>> > > know if it's normal or not.At that time I saw web ui can not open.I
>> > restart
>> > > the regionserver then hbase back to normal. Is it possible bug
>> > > HBASE-11277?
>> > > >
>> > > >
>> > > > Regionserver on the log basically almost this warn log
>> > > > mater on the log is as follows:
>> > > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
>> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
>> > > merged region(s) and 1 unreferenced parent region(s)
>> > > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
>> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
>> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
>> > > ipc.RpcServer: (responseTooSlow):
>> > >
>> > {"processingtimems":70266,"call":"ListTableDescriptorsByName
>> space(org.apache.hadoop.hbase.protobuf.generated.MasterProto
>> s$ListTableDescriptorsByNamespaceRequest)","client":"
>> > > 192.168.39.22:60292
>> > >
>> > ","starttimems":1445593715207,"queuetimems":0,"class":"HMast
>> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
>> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: (responseTooSlow):
>> > >
>> > {"processingtimems":130525,"call":"ListTableDescriptorsByNam
>> espace(org.apache.hadoop.hbase.protobuf.generated.Maste
>> rProtos$ListTableDescriptorsByNamespaceRequest)","client":"
>> > > 192.168.39.22:60286
>> > >
>> > ","starttimems":1445593654945,"queuetimems":0,"class":"HMast
>> er","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
>> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
>> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service:
>> MasterService
>> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
>> > > 192.168.39.22:60292: output error
>> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service:
>> MasterService
>> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
>> > > 192.168.39.22:60286: output error
>> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
>> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
>> > > ClosedChannelException, this means that the server was processing a
>> > request
>> > > but the client went away. The error message was: null
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > ------------------ 原始邮件 ------------------
>> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
>> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
>> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>> > > >
>> > > > 主题: Re: Hbase cluster is suddenly unable to respond
>> > > >
>> > > >
>> > > >
>> > > > Were other region servers functioning normally around 17:33 ?
>> > > >
>> > > > Which hbase release are you using ?
>> > > >
>> > > > Can you pastebin more of the region server log ?
>> > > >
>> > > > Thanks
>> > > >
>> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>> > > >>
>> > > >> hi,all:
>> > > >>
>> > > >>
>> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
>> > > respond.after
>> > > >> I restart a regionserver after,the cluster has recovered.I don't
>> know
>> > > the
>> > > >> cause of the trouble.I hope I can get help from you.
>> > > >>
>> > > >>
>> > > >> Regionserver on the log is as follows:
>> > > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
>> > wal.FSHLog:
>> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
>> > > %2C60020%2C1442810406218.1445580462689
>> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> > > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
>> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read:
>> 0
>> > > >> java.io.IOException: Connection reset by peer
>> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> > > >> at sun.nio.ch.SocketDispatcher.re
>> ad(SocketDispatcher.java:39)
>> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> > > >> at
>> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> > > >> at
>> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1403)
>> > > >> at
>> > > >>
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > >> at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > >> at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > >> at java.lang.Thread.run(Thread.java:744)
>> > > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
>> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read:
>> 0
>> > > >> java.io.IOException: Connection reset by peer
>> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> > > >> at sun.nio.ch.SocketDispatcher.re
>> ad(SocketDispatcher.java:39)
>> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> > > >> at
>> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> > > >> at
>> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.
>> java:2368)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProc
>> ess(RpcServer.java:1403)
>> > > >> at
>> > > >>
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcSer
>> ver.java:770)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunL
>> oop(RpcServer.java:563)
>> > > >> at
>> > > >>
>> > >
>> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(Rp
>> cServer.java:538)
>> > > >> at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
>> Executor.java:1145)
>> > > >> at
>> > > >>
>> > >
>> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
>> lExecutor.java:615)
>> > > >> at java.lang.Thread.run(Thread.java:744)
>> > >
>> >
>>
>
>
>
> --
> Thank you
> Kiran Sarvabhotla
>
> -----Even a correct decision is wrong when it is taken late
>
>
--
Thank you
Kiran Sarvabhotla
-----Even a correct decision is wrong when it is taken late
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by kiran <ki...@gmail.com>.
We are also facing the same issue. Please tell us what is the solution. I
have increased the rpc timeout and caching is reduced but with no effect.
We are using hbase 0.98.7. Please suggest a work around as we are facing
the issue very frequently now and we are having downtime in production.
On Fri, Oct 30, 2015 at 9:21 AM, 聪聪 <17...@qq.com> wrote:
> There is a view child nodes loop code http://paste2.org/zm8GE7xH
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月30日(星期五) 上午10:24
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> The client code is http://paste2.org/p3BXkKtV
>
>
> Is the client version compatible with it?
> I see that the client version is hbase0.96.1.1-hadoop2
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月30日(星期五) 凌晨0:08
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> Client side, have they tried increasing direct memory size ?
> -XX:MaxDirectMemorySize=
>
> Do you know how wide the rows returned may get ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
>
> > Developers feedback their client has the following error:
> >
> > [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> > connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> > tomcat: unexpected exception receiving call responses
> >
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> > at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
> >
> > at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> > ~[na:1.6.0_20]
> >
> > at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> > ~[na:1.6.0_20]
> >
> > at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> > ~[na:1.6.0_20]
> >
> > at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
> >
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> > ~[na:1.6.0_20]
> >
> > at
> > org.apache.hadoop.net.SocketInputStream$Reader.
> performIO(SocketInputStream.java:57)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> > at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> SocketIOWithTimeout.java:142)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> > at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> > at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
> > ~[na:1.6.0_20]
> >
> > at java.io.FilterInputStream.read(FilterInputStream.java:116)
> > ~[na:1.6.0_20]
> >
> > at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(
> RpcClient.java:555)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> > at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> > ~[na:1.6.0_20]
> >
> > at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> > ~[na:1.6.0_20]
> >
> > at java.io.DataInputStream.read(DataInputStream.java:132)
> > ~[na:1.6.0_20]
> >
> > at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> > ~[hadoop-common-2.2.0.jar:na]
> >
> > at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.
> readResponse(RpcClient.java:1101)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> > at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上10:48
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > I took a look at the jstack.
> > The threads involving RpcServer$Connection.readAndProcess() were in
> > RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> > HBASE-11277
> > .
> >
> > The protobuf exception shown in your earlier email corresponded to the
> > following in hbase-protocol/src/main/protobuf/Client.proto :
> >
> > message GetRequest {
> > required RegionSpecifier region = 1;
> > required Get get = 2;
> > }
> >
> > Are all your hbase clients running in the same version ?
> >
> > Cheers
> >
> > On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> >
> > > the regionserver jstack log is http://paste2.org/yLDJeXgL
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> > > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > hi Ted:
> > >
> > >
> > > Yesterday around 14:40,one of regionservers hang once against.At that
> > time
> > > I saw web ui can not open.Hbase cluster is unable to respond.Very
> > anxious,
> > > hoping to get help!
> > >
> > >
> > > jstack log is as follows:
> > > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > > nid=0x12d3 runnable [0x00007f3bebe58000]
> > > java.lang.Thread.State: RUNNABLE
> > > at sun.nio.ch.NativeThread.current(Native Method)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > > - locked <0x00007f3d27360fb0> (a java.lang.Object)
> > > - locked <0x00007f3d27360f90> (a java.lang.Object)
> > > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1476)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > - locked <0x00007f3c584ce990> (a
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > > nid=0x12d2 runnable [0x00007f3bebf59000]
> > > java.lang.Thread.State: RUNNABLE
> > > at sun.nio.ch.NativeThread.current(Native Method)
> > > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > > - locked <0x00007f3d27360530> (a java.lang.Object)
> > > - locked <0x00007f3d27360510> (a java.lang.Object)
> > > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1476)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > - locked <0x00007f3c584cf7d8> (a
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > >
> > >
> > >
> > > region server log :
> > > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion:
> Finished
> > > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> > >
> > order_history,2801xyz140618175732642$3,1418829598639.
> afc853471a8cd4184bc9e7be00b8eea0.
> > > in 45ms, sequenceid=9599960557, compaction requested=true
> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > > regionserver.HRegionServer:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> > close_scanner:
> > > false next_call_seq: 0
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> scan(HRegionServer.java:3007)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:26929)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > > regionserver.HRegionServer:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > > request=scanner_id: 16740356019163164014 number_of_rows: 10
> > close_scanner:
> > > false next_call_seq: 0
> > > at
> > >
> > org.apache.hadoop.hbase.regionserver.HRegionServer.
> scan(HRegionServer.java:3007)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.
> callBlockingMethod(ClientProtos.java:26929)
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > >
> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > > com.google.protobuf.UninitializedMessageException: Message missing
> > > required fields: region, get
> > > at
> > >
> > com.google.protobuf.AbstractMessage$Builder.
> newUninitializedMessageException(AbstractMessage.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4474)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4406)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processRequest(RpcServer.java:1689)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processOneRpc(RpcServer.java:1631)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1491)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:744)
> > > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> > > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > > com.google.protobuf.UninitializedMessageException: Message missing
> > > required fields: region, get
> > > at
> > >
> > com.google.protobuf.AbstractMessage$Builder.
> newUninitializedMessageException(AbstractMessage.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4474)
> > > at
> > >
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$
> Builder.build(ClientProtos.java:4406)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processRequest(RpcServer.java:1689)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> processOneRpc(RpcServer.java:1631)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1491)
> > > at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > at
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > at
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > at java.lang.Thread.run(Thread.java:744)
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "蒲聪-北京";<17...@qq.com>;
> > > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > >
> > >
> > > Thank you very much!
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > > 收件人: "user"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > The fix from HBASE-11277 may solve your problem - if you collect stack
> > > trace during the hang, we would have more clue.
> > >
> > > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> > >
> > > Cheers
> > >
> > > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > > >
> > > > hi,Ted:
> > > >
> > > >
> > > > I use the HBase version is hbase-0.96.0.
> > > > Around 17:33,other region servers also appeared in this warn log.I
> > don't
> > > know if it's normal or not.At that time I saw web ui can not open.I
> > restart
> > > the regionserver then hbase back to normal. Is it possible bug
> > > HBASE-11277?
> > > >
> > > >
> > > > Regionserver on the log basically almost this warn log
> > > > mater on the log is as follows:
> > > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> > > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > > merged region(s) and 1 unreferenced parent region(s)
> > > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> > > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > > ipc.RpcServer: (responseTooSlow):
> > >
> > {"processingtimems":70266,"call":"ListTableDescriptorsByNamespac
> e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
> ListTableDescriptorsByNamespaceRequest)","client":"
> > > 192.168.39.22:60292
> > >
> > ","starttimems":1445593715207,"queuetimems":0,"class":"
> HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: (responseTooSlow):
> > >
> > {"processingtimems":130525,"call":"ListTableDescriptorsByNamespac
> e(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$
> ListTableDescriptorsByNamespaceRequest)","client":"
> > > 192.168.39.22:60286
> > >
> > ","starttimems":1445593654945,"queuetimems":0,"class":"
> HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > > 192.168.39.22:60292: output error
> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > > 192.168.39.22:60286: output error
> > > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > > ClosedChannelException, this means that the server was processing a
> > request
> > > but the client went away. The error message was: null
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ------------------ 原始邮件 ------------------
> > > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > > >
> > > > 主题: Re: Hbase cluster is suddenly unable to respond
> > > >
> > > >
> > > >
> > > > Were other region servers functioning normally around 17:33 ?
> > > >
> > > > Which hbase release are you using ?
> > > >
> > > > Can you pastebin more of the region server log ?
> > > >
> > > > Thanks
> > > >
> > > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > > >>
> > > >> hi,all:
> > > >>
> > > >>
> > > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > > respond.after
> > > >> I restart a regionserver after,the cluster has recovered.I don't
> know
> > > the
> > > >> cause of the trouble.I hope I can get help from you.
> > > >>
> > > >>
> > > >> Regionserver on the log is as follows:
> > > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
> > wal.FSHLog:
> > > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > > %2C60020%2C1442810406218.1445580462689
> > > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > > >> java.io.IOException: Connection reset by peer
> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > >> at sun.nio.ch.SocketDispatcher.
> read(SocketDispatcher.java:39)
> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > > >> at
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > > >> at
> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
> RpcServer.java:2368)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1403)
> > > >> at
> > > >>
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > >> at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > >> at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > >> at java.lang.Thread.run(Thread.java:744)
> > > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> > > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > > >> java.io.IOException: Connection reset by peer
> > > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > > >> at sun.nio.ch.SocketDispatcher.
> read(SocketDispatcher.java:39)
> > > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > > >> at
> > sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > > >> at
> > > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(
> RpcServer.java:2368)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Connection.
> readAndProcess(RpcServer.java:1403)
> > > >> at
> > > >>
> > > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(
> RpcServer.java:770)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.
> doRunLoop(RpcServer.java:563)
> > > >> at
> > > >>
> > >
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(
> RpcServer.java:538)
> > > >> at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> > > >> at
> > > >>
> > >
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> > > >> at java.lang.Thread.run(Thread.java:744)
> > >
> >
>
--
Thank you
Kiran Sarvabhotla
-----Even a correct decision is wrong when it is taken late
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
There is a view child nodes loop code http://paste2.org/zm8GE7xH
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月30日(星期五) 上午10:24
收件人: "user"<us...@hbase.apache.org>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
The client code is http://paste2.org/p3BXkKtV
Is the client version compatible with it?
I see that the client version is hbase0.96.1.1-hadoop2
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=
Do you know how wide the rows returned may get ?
Cheers
On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
> at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
> required RegionSpecifier region = 1;
> required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360fb0> (a java.lang.Object)
> > - locked <0x00007f3d27360f90> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360530> (a java.lang.Object)
> > - locked <0x00007f3d27360510> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver then hbase back to normal. Is it possible bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost this warn log
> > > mater on the log is as follows:
> > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> >
>
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
The client code is http://paste2.org/p3BXkKtV
Is the client version compatible with it?
I see that the client version is hbase0.96.1.1-hadoop2
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月30日(星期五) 凌晨0:08
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=
Do you know how wide the rows returned may get ?
Cheers
On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
> at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
> required RegionSpecifier region = 1;
> required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360fb0> (a java.lang.Object)
> > - locked <0x00007f3d27360f90> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360530> (a java.lang.Object)
> > - locked <0x00007f3d27360510> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver then hbase back to normal. Is it possible bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost this warn log
> > > mater on the log is as follows:
> > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> >
>
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by Ted Yu <yu...@gmail.com>.
Client side, have they tried increasing direct memory size ?
-XX:MaxDirectMemorySize=
Do you know how wide the rows returned may get ?
Cheers
On Thu, Oct 29, 2015 at 9:03 AM, 聪聪 <17...@qq.com> wrote:
> Developers feedback their client has the following error:
>
> [2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969)
> connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from
> tomcat: unexpected exception receiving call responses
>
> java.lang.OutOfMemoryError: Direct buffer memory
>
> at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
>
> at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95)
> ~[na:1.6.0_20]
>
> at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57)
> ~[na:1.6.0_20]
>
> at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
>
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> ~[hadoop-common-2.2.0.jar:na]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> ~[na:1.6.0_20]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
> ~[na:1.6.0_20]
>
> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
> ~[na:1.6.0_20]
>
> at java.io.DataInputStream.read(DataInputStream.java:132)
> ~[na:1.6.0_20]
>
> at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192)
> ~[hadoop-common-2.2.0.jar:na]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
> at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月29日(星期四) 晚上10:48
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> I took a look at the jstack.
> The threads involving RpcServer$Connection.readAndProcess() were in
> RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in
> HBASE-11277
> .
>
> The protobuf exception shown in your earlier email corresponded to the
> following in hbase-protocol/src/main/protobuf/Client.proto :
>
> message GetRequest {
> required RegionSpecifier region = 1;
> required Get get = 2;
> }
>
> Are all your hbase clients running in the same version ?
>
> Cheers
>
> On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
>
> > the regionserver jstack log is http://paste2.org/yLDJeXgL
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月29日(星期四) 晚上9:10
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > hi Ted:
> >
> >
> > Yesterday around 14:40,one of regionservers hang once against.At that
> time
> > I saw web ui can not open.Hbase cluster is unable to respond.Very
> anxious,
> > hoping to get help!
> >
> >
> > jstack log is as follows:
> > "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> > nid=0x12d3 runnable [0x00007f3bebe58000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360fb0> (a java.lang.Object)
> > - locked <0x00007f3d27360f90> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584ce990> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> > "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> > nid=0x12d2 runnable [0x00007f3bebf59000]
> > java.lang.Thread.State: RUNNABLE
> > at sun.nio.ch.NativeThread.current(Native Method)
> > at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> > - locked <0x00007f3d27360530> (a java.lang.Object)
> > - locked <0x00007f3d27360510> (a java.lang.Object)
> > at
> org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > - locked <0x00007f3c584cf7d8> (a
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> >
> >
> > region server log :
> > 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished
> > memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> >
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> > in 45ms, sequenceid=9599960557, compaction requested=true
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> > 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> > regionserver.HRegionServer:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id: 16740356019163164014 number_of_rows: 10
> close_scanner:
> > false next_call_seq: 0
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> >
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> > 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> > ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> > com.google.protobuf.UninitializedMessageException: Message missing
> > required fields: region, get
> > at
> >
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> > at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> > at
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > at
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > at java.lang.Thread.run(Thread.java:744)
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "蒲聪-北京";<17...@qq.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上9:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> >
> >
> > Thank you very much!
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月26日(星期一) 晚上8:28
> > 收件人: "user"<us...@hbase.apache.org>;
> >
> > 主题: Re: 回复: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > The fix from HBASE-11277 may solve your problem - if you collect stack
> > trace during the hang, we would have more clue.
> >
> > I suggest upgrading to newer release such as 1.1.2 or 0.98.15
> >
> > Cheers
> >
> > > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> > >
> > > hi,Ted:
> > >
> > >
> > > I use the HBase version is hbase-0.96.0.
> > > Around 17:33,other region servers also appeared in this warn log.I
> don't
> > know if it's normal or not.At that time I saw web ui can not open.I
> restart
> > the regionserver then hbase back to normal. Is it possible bug
> > HBASE-11277?
> > >
> > >
> > > Regionserver on the log basically almost this warn log
> > > mater on the log is as follows:
> > > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> > master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> > merged region(s) and 1 unreferenced parent region(s)
> > > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> > master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60292
> >
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: (responseTooSlow):
> >
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> > 192.168.39.22:60286
> >
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60292: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> > methodName: ListTableDescriptorsByNamespace size: 48 connection:
> > 192.168.39.22:60286: output error
> > > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> > ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> > ClosedChannelException, this means that the server was processing a
> request
> > but the client went away. The error message was: null
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ------------------ 原始邮件 ------------------
> > > 发件人: "Ted Yu";<yu...@gmail.com>;
> > > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> > >
> > > 主题: Re: Hbase cluster is suddenly unable to respond
> > >
> > >
> > >
> > > Were other region servers functioning normally around 17:33 ?
> > >
> > > Which hbase release are you using ?
> > >
> > > Can you pastebin more of the region server log ?
> > >
> > > Thanks
> > >
> > >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> > >>
> > >> hi,all:
> > >>
> > >>
> > >> This afternoon,The whole Hbase cluster is suddenly unable to
> > respond.after
> > >> I restart a regionserver after,the cluster has recovered.I don't know
> > the
> > >> cause of the trouble.I hope I can get help from you.
> > >>
> > >>
> > >> Regionserver on the log is as follows:
> > >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller]
> wal.FSHLog:
> > >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> > >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> > %2C60020%2C1442810406218.1445580462689
> > >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> > >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> > >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> > >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> > >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> > >> java.io.IOException: Connection reset by peer
> > >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> > >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> > >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> > >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> > >> at
> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> > >> at
> > >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> > >> at
> > >>
> > org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> > >> at
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> > >> at
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> > >> at java.lang.Thread.run(Thread.java:744)
> >
>
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
Developers feedback their client has the following error:
[2015/10/29 19:20:42.260][WARN][RpcClient:724] IPC Client (1904394969) connection to l-hbase28.data.cn8.qunar.com/192.168.44.32:60020 from tomcat: unexpected exception receiving call responses
java.lang.OutOfMemoryError: Direct buffer memory
at java.nio.Bits.reserveMemory(Bits.java:633) ~[na:1.6.0_20]
at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:95) ~[na:1.6.0_20]
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:288) ~[na:1.6.0_20]
at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:57) ~[na:1.6.0_20]
at sun.nio.ch.IOUtil.read(IOUtil.java:205) ~[na:1.6.0_20]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236) ~[na:1.6.0_20]
at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) ~[hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) ~[hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) ~[hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) ~[hadoop-common-2.2.0.jar:na]
at java.io.FilterInputStream.read(FilterInputStream.java:116) ~[na:1.6.0_20]
at java.io.FilterInputStream.read(FilterInputStream.java:116) ~[na:1.6.0_20]
at org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:555) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) ~[na:1.6.0_20]
at java.io.BufferedInputStream.read(BufferedInputStream.java:317) ~[na:1.6.0_20]
at java.io.DataInputStream.read(DataInputStream.java:132) ~[na:1.6.0_20]
at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:192) ~[hadoop-common-2.2.0.jar:na]
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1101) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721) ~[hbase-client-0.96.1.1-hadoop2.jar:0.96.1.1-hadoop2]
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月29日(星期四) 晚上10:48
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in HBASE-11277
.
The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :
message GetRequest {
required RegionSpecifier region = 1;
required Get get = 2;
}
Are all your hbase clients running in the same version ?
Cheers
On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> the regionserver jstack log is http://paste2.org/yLDJeXgL
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月29日(星期四) 晚上9:10
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> hi Ted:
>
>
> Yesterday around 14:40,one of regionservers hang once against.At that time
> I saw web ui can not open.Hbase cluster is unable to respond.Very anxious,
> hoping to get help!
>
>
> jstack log is as follows:
> "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> nid=0x12d3 runnable [0x00007f3bebe58000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.NativeThread.current(Native Method)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> - locked <0x00007f3d27360fb0> (a java.lang.Object)
> - locked <0x00007f3d27360f90> (a java.lang.Object)
> at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> - locked <0x00007f3c584ce990> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
> "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> nid=0x12d2 runnable [0x00007f3bebf59000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.NativeThread.current(Native Method)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> - locked <0x00007f3d27360530> (a java.lang.Object)
> - locked <0x00007f3d27360510> (a java.lang.Object)
> at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> - locked <0x00007f3c584cf7d8> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
> region server log :
> 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished
> memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> in 45ms, sequenceid=9599960557, compaction requested=true
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
>
> 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
> at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
> at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月26日(星期一) 晚上9:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> Thank you very much!
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月26日(星期一) 晚上8:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> The fix from HBASE-11277 may solve your problem - if you collect stack
> trace during the hang, we would have more clue.
>
> I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>
> Cheers
>
> > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >
> > hi,Ted:
> >
> >
> > I use the HBase version is hbase-0.96.0.
> > Around 17:33,other region servers also appeared in this warn log.I don't
> know if it's normal or not.At that time I saw web ui can not open.I restart
> the regionserver then hbase back to normal. Is it possible bug
> HBASE-11277?
> >
> >
> > Regionserver on the log basically almost this warn log
> > mater on the log is as follows:
> > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> merged region(s) and 1 unreferenced parent region(s)
> > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60292
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60286
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60292: output error
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60286: output error
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> ClosedChannelException, this means that the server was processing a request
> but the client went away. The error message was: null
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > Were other region servers functioning normally around 17:33 ?
> >
> > Which hbase release are you using ?
> >
> > Can you pastebin more of the region server log ?
> >
> > Thanks
> >
> >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> hi,all:
> >>
> >>
> >> This afternoon,The whole Hbase cluster is suddenly unable to
> respond.after
> >> I restart a regionserver after,the cluster has recovered.I don't know
> the
> >> cause of the trouble.I hope I can get help from you.
> >>
> >>
> >> Regionserver on the log is as follows:
> >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
> >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> %2C60020%2C1442810406218.1445580462689
> >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:744)
> >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:744)
>
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by Ted Yu <yu...@gmail.com>.
I took a look at the jstack.
The threads involving RpcServer$Connection.readAndProcess() were in
RUNNABLE state, not BLOCKED or IN_NATIVE state - as described in HBASE-11277
.
The protobuf exception shown in your earlier email corresponded to the
following in hbase-protocol/src/main/protobuf/Client.proto :
message GetRequest {
required RegionSpecifier region = 1;
required Get get = 2;
}
Are all your hbase clients running in the same version ?
Cheers
On Thu, Oct 29, 2015 at 7:28 AM, 聪聪 <17...@qq.com> wrote:
> the regionserver jstack log is http://paste2.org/yLDJeXgL
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月29日(星期四) 晚上9:10
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> hi Ted:
>
>
> Yesterday around 14:40,one of regionservers hang once against.At that time
> I saw web ui can not open.Hbase cluster is unable to respond.Very anxious,
> hoping to get help!
>
>
> jstack log is as follows:
> "RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800
> nid=0x12d3 runnable [0x00007f3bebe58000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.NativeThread.current(Native Method)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> - locked <0x00007f3d27360fb0> (a java.lang.Object)
> - locked <0x00007f3d27360f90> (a java.lang.Object)
> at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> - locked <0x00007f3c584ce990> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
> "RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000
> nid=0x12d2 runnable [0x00007f3bebf59000]
> java.lang.Thread.State: RUNNABLE
> at sun.nio.ch.NativeThread.current(Native Method)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
> - locked <0x00007f3d27360530> (a java.lang.Object)
> - locked <0x00007f3d27360510> (a java.lang.Object)
> at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> - locked <0x00007f3c584cf7d8> (a
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
>
>
> region server log :
> 2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished
> memstore flush of ~3.6 M/3820648, currentsize=536/536 for region
> order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0.
> in 45ms, sequenceid=9599960557, compaction requested=true
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
> 2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020]
> regionserver.HRegionServer:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner:
> false next_call_seq: 0
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
>
> 2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
> at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020]
> ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
> com.google.protobuf.UninitializedMessageException: Message missing
> required fields: region, get
> at
> com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
> at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "蒲聪-北京";<17...@qq.com>;
> 发送时间: 2015年10月26日(星期一) 晚上9:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: 回复: 回复: Hbase cluster is suddenly unable to respond
>
>
>
>
>
> Thank you very much!
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月26日(星期一) 晚上8:28
> 收件人: "user"<us...@hbase.apache.org>;
>
> 主题: Re: 回复: Hbase cluster is suddenly unable to respond
>
>
>
> The fix from HBASE-11277 may solve your problem - if you collect stack
> trace during the hang, we would have more clue.
>
> I suggest upgrading to newer release such as 1.1.2 or 0.98.15
>
> Cheers
>
> > On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
> >
> > hi,Ted:
> >
> >
> > I use the HBase version is hbase-0.96.0.
> > Around 17:33,other region servers also appeared in this warn log.I don't
> know if it's normal or not.At that time I saw web ui can not open.I restart
> the regionserver then hbase back to normal. Is it possible bug
> HBASE-11277?
> >
> >
> > Regionserver on the log basically almost this warn log
> > mater on the log is as follows:
> > 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000]
> master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced
> merged region(s) and 1 unreferenced parent region(s)
> > 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000]
> master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60292
> ","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: (responseTooSlow):
> {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"
> 192.168.39.22:60286
> ","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60292: output error
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService
> methodName: ListTableDescriptorsByNamespace size: 48 connection:
> 192.168.39.22:60286: output error
> > 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000]
> ipc.RpcServer: RpcServer.handler=6,port=60000: caught a
> ClosedChannelException, this means that the server was processing a request
> but the client went away. The error message was: null
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "Ted Yu";<yu...@gmail.com>;
> > 发送时间: 2015年10月23日(星期五) 晚上11:39
> > 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
> >
> > 主题: Re: Hbase cluster is suddenly unable to respond
> >
> >
> >
> > Were other region servers functioning normally around 17:33 ?
> >
> > Which hbase release are you using ?
> >
> > Can you pastebin more of the region server log ?
> >
> > Thanks
> >
> >> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> >>
> >> hi,all:
> >>
> >>
> >> This afternoon,The whole Hbase cluster is suddenly unable to
> respond.after
> >> I restart a regionserver after,the cluster has recovered.I don't know
> the
> >> cause of the trouble.I hope I can get help from you.
> >>
> >>
> >> Regionserver on the log is as follows:
> >> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
> >> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> >> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com
> %2C60020%2C1442810406218.1445580462689
> >> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> >> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> >> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:744)
> >> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> >> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> >> java.io.IOException: Connection reset by peer
> >> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> >> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> >> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> >> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> >> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> >> at
> >> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> >> at
> >>
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >> at
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:744)
>
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
the regionserver jstack log is http://paste2.org/yLDJeXgL
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月29日(星期四) 晚上9:10
收件人: "user"<us...@hbase.apache.org>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
hi Ted:
Yesterday around 14:40,one of regionservers hang once against.At that time I saw web ui can not open.Hbase cluster is unable to respond.Very anxious, hoping to get help!
jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800 nid=0x12d3 runnable [0x00007f3bebe58000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360fb0> (a java.lang.Object)
- locked <0x00007f3d27360f90> (a java.lang.Object)
at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584ce990> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000 nid=0x12d2 runnable [0x00007f3bebf59000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360530> (a java.lang.Object)
- locked <0x00007f3d27360510> (a java.lang.Object)
at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584cf7d8> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
region server log :
2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished memstore flush of ~3.6 M/3820648, currentsize=536/536 for region order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0. in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
Thank you very much!
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue.
I suggest upgrading to newer release such as 1.1.2 or 0.98.15
Cheers
> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>
> hi,Ted:
>
>
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver then hbase back to normal. Is it possible bug HBASE-11277?
>
>
> Regionserver on the log basically almost this warn log
> mater on the log is as follows:
> 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
>
>
>
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: Hbase cluster is suddenly unable to respond
>
>
>
> Were other region servers functioning normally around 17:33 ?
>
> Which hbase release are you using ?
>
> Can you pastebin more of the region server log ?
>
> Thanks
>
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>>
>> hi,all:
>>
>>
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>>
>>
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
hi Ted:
Yesterday around 14:40,one of regionservers hang once against.At that time I saw web ui can not open.Hbase cluster is unable to respond.Very anxious, hoping to get help!
jstack log is as follows:
"RpcServer.reader=4,port=60020" daemon prio=10 tid=0x00007f4466146800 nid=0x12d3 runnable [0x00007f3bebe58000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360fb0> (a java.lang.Object)
- locked <0x00007f3d27360f90> (a java.lang.Object)
at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584ce990> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
"RpcServer.reader=3,port=60020" daemon prio=10 tid=0x00007f4466145000 nid=0x12d2 runnable [0x00007f3bebf59000]
java.lang.Thread.State: RUNNABLE
at sun.nio.ch.NativeThread.current(Native Method)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:325)
- locked <0x00007f3d27360530> (a java.lang.Object)
- locked <0x00007f3d27360510> (a java.lang.Object)
at org.apache.hadoop.hbase.ipc.RpcServer.channelIO(RpcServer.java:2402)
at org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1476)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
- locked <0x00007f3c584cf7d8> (a org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
region server log :
2015-10-28 14:38:19,801 INFO [Thread-15] regionserver.HRegion: Finished memstore flush of ~3.6 M/3820648, currentsize=536/536 for region order_history,2801xyz140618175732642$3,1418829598639.afc853471a8cd4184bc9e7be00b8eea0. in 45ms, sequenceid=9599960557, compaction requested=true
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=3,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,693 ERROR [RpcServer.handler=9,port=60020] regionserver.HRegionServer:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 16740356019163164014 number_of_rows: 10 close_scanner: false next_call_seq: 0
at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3007)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2146)
at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1851)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=2,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
2015-10-28 14:38:32,696 WARN [RpcServer.reader=1,port=60020] ipc.RpcServer: Unable to read call parameter from client 192.168.37.135
com.google.protobuf.UninitializedMessageException: Message missing required fields: region, get
at com.google.protobuf.AbstractMessage$Builder.newUninitializedMessageException(AbstractMessage.java:770)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4474)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$GetRequest$Builder.build(ClientProtos.java:4406)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processRequest(RpcServer.java:1689)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.processOneRpc(RpcServer.java:1631)
at org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1491)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
at org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
------------------ 原始邮件 ------------------
发件人: "蒲聪-北京";<17...@qq.com>;
发送时间: 2015年10月26日(星期一) 晚上9:28
收件人: "user"<us...@hbase.apache.org>;
主题: 回复: 回复: Hbase cluster is suddenly unable to respond
Thank you very much!
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue.
I suggest upgrading to newer release such as 1.1.2 or 0.98.15
Cheers
> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>
> hi,Ted:
>
>
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver then hbase back to normal. Is it possible bug HBASE-11277?
>
>
> Regionserver on the log basically almost this warn log
> mater on the log is as follows:
> 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
>
>
>
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: Hbase cluster is suddenly unable to respond
>
>
>
> Were other region servers functioning normally around 17:33 ?
>
> Which hbase release are you using ?
>
> Can you pastebin more of the region server log ?
>
> Thanks
>
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>>
>> hi,all:
>>
>>
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>>
>>
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
回复: 回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
Thank you very much!
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月26日(星期一) 晚上8:28
收件人: "user"<us...@hbase.apache.org>;
主题: Re: 回复: Hbase cluster is suddenly unable to respond
The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue.
I suggest upgrading to newer release such as 1.1.2 or 0.98.15
Cheers
> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>
> hi,Ted:
>
>
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver then hbase back to normal. Is it possible bug HBASE-11277?
>
>
> Regionserver on the log basically almost this warn log
> mater on the log is as follows:
> 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
>
>
>
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: Hbase cluster is suddenly unable to respond
>
>
>
> Were other region servers functioning normally around 17:33 ?
>
> Which hbase release are you using ?
>
> Can you pastebin more of the region server log ?
>
> Thanks
>
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>>
>> hi,all:
>>
>>
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>>
>>
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
Re: 回复: Hbase cluster is suddenly unable to respond
Posted by Ted Yu <yu...@gmail.com>.
The fix from HBASE-11277 may solve your problem - if you collect stack trace during the hang, we would have more clue.
I suggest upgrading to newer release such as 1.1.2 or 0.98.15
Cheers
> On Oct 26, 2015, at 12:42 AM, 聪聪 <17...@qq.com> wrote:
>
> hi,Ted:
>
>
> I use the HBase version is hbase-0.96.0.
> Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver then hbase back to normal. Is it possible bug HBASE-11277?
>
>
> Regionserver on the log basically almost this warn log
> mater on the log is as follows:
> 2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
> 2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
> 2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
>
>
>
>
>
>
>
>
>
>
>
> ------------------ 原始邮件 ------------------
> 发件人: "Ted Yu";<yu...@gmail.com>;
> 发送时间: 2015年10月23日(星期五) 晚上11:39
> 收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
>
> 主题: Re: Hbase cluster is suddenly unable to respond
>
>
>
> Were other region servers functioning normally around 17:33 ?
>
> Which hbase release are you using ?
>
> Can you pastebin more of the region server log ?
>
> Thanks
>
>> On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
>>
>> hi,all:
>>
>>
>> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
>> I restart a regionserver after,the cluster has recovered.I don't know the
>> cause of the trouble.I hope I can get help from you.
>>
>>
>> Regionserver on the log is as follows:
>> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
>> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
>> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
>> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
>> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
>> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
>> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
>> java.io.IOException: Connection reset by peer
>> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
>> at
>> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
>> at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> at java.lang.Thread.run(Thread.java:744)
回复: Hbase cluster is suddenly unable to respond
Posted by 聪聪 <17...@qq.com>.
hi,Ted:
I use the HBase version is hbase-0.96.0.
Around 17:33,other region servers also appeared in this warn log.I don't know if it's normal or not.At that time I saw web ui can not open.I restart the regionserver then hbase back to normal. Is it possible bug HBASE-11277?
Regionserver on the log basically almost this warn log
mater on the log is as follows:
2015-10-21 22:15:43,575 INFO [CatalogJanitor-l-namenode2:60000] master.CatalogJanitor: Scanned 672 catalog row(s), gc'd 0 unreferenced merged region(s) and 1 unreferenced parent region(s)
2015-10-23 17:47:25,617 INFO [RpcServer.handler=28,port=60000] master.HMaster: Client=hbase//192.168.39.19 set balanceSwitch=false
2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":70266,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60292","starttimems":1445593715207,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: (responseTooSlow): {"processingtimems":130525,"call":"ListTableDescriptorsByNamespace(org.apache.hadoop.hbase.protobuf.generated.MasterProtos$ListTableDescriptorsByNamespaceRequest)","client":"192.168.39.22:60286","starttimems":1445593654945,"queuetimems":0,"class":"HMaster","responsesize":704,"method":"ListTableDescriptorsByNamespace"}
2015-10-23 17:49:45,513 WARN [RpcServer.handler=24,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130953 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60292: output error
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.respondercallId: 130945 service: MasterService methodName: ListTableDescriptorsByNamespace size: 48 connection: 192.168.39.22:60286: output error
2015-10-23 17:49:45,513 WARN [RpcServer.handler=6,port=60000] ipc.RpcServer: RpcServer.handler=6,port=60000: caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
------------------ 原始邮件 ------------------
发件人: "Ted Yu";<yu...@gmail.com>;
发送时间: 2015年10月23日(星期五) 晚上11:39
收件人: "user@hbase.apache.org"<us...@hbase.apache.org>;
主题: Re: Hbase cluster is suddenly unable to respond
Were other region servers functioning normally around 17:33 ?
Which hbase release are you using ?
Can you pastebin more of the region server log ?
Thanks
On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> hi,all:
>
>
> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
> I restart a regionserver after,the cluster has recovered.I don't know the
> cause of the trouble.I hope I can get help from you.
>
>
> Regionserver on the log is as follows:
> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
Re: Hbase cluster is suddenly unable to respond
Posted by Ted Yu <yu...@gmail.com>.
Were other region servers functioning normally around 17:33 ?
Which hbase release are you using ?
Can you pastebin more of the region server log ?
Thanks
On Fri, Oct 23, 2015 at 8:28 AM, 聪聪 <17...@qq.com> wrote:
> hi,all:
>
>
> This afternoon,The whole Hbase cluster is suddenly unable to respond.after
> I restart a regionserver after,the cluster has recovered.I don't know the
> cause of the trouble.I hope I can get help from you.
>
>
> Regionserver on the log is as follows:
> 2015-10-23 17:28:49,335 INFO [regionserver60020.logRoller] wal.FSHLog:
> moving old hlog file /hbase/WALs/l-hbase30.data.cn8.qunar.com
> ,60020,1442810406218/l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> whose highest sequenceid is 9071525521 to /hbase/oldWALs/
> l-hbase30.data.cn8.qunar.com%2C60020%2C1442810406218.1445580462689
> 2015-10-23 17:33:31,375 WARN [RpcServer.reader=8,port=60020]
> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> 2015-10-23 17:33:31,779 WARN [RpcServer.reader=2,port=60020]
> ipc.RpcServer: RpcServer.listener,port=60020: count of bytes read: 0
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at
> org.apache.hadoop.hbase.ipc.RpcServer.channelRead(RpcServer.java:2368)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Connection.readAndProcess(RpcServer.java:1403)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener.doRead(RpcServer.java:770)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.doRunLoop(RpcServer.java:563)
> at
> org.apache.hadoop.hbase.ipc.RpcServer$Listener$Reader.run(RpcServer.java:538)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)