You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Andrew Purtell (JIRA)" <ji...@apache.org> on 2012/12/02 02:43:58 UTC
[jira] [Commented] (HBASE-7251) Avoid flood logs during client
disconnect during batch get operation
[ https://issues.apache.org/jira/browse/HBASE-7251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13508127#comment-13508127 ]
Andrew Purtell commented on HBASE-7251:
---------------------------------------
Should client state be an enum that can capture more states than a boolean?
bq. Importantly, server stop processing immediately if client timeout or disconnect.
The patch does this just for the case of Gets in HRegionServer#multi(). Should all ops have a similar treatment? And scanners too, especially if they have highly selective filters?
> Avoid flood logs during client disconnect during batch get operation
> --------------------------------------------------------------------
>
> Key: HBASE-7251
> URL: https://issues.apache.org/jira/browse/HBASE-7251
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.94.2
> Reporter: Fengdong Yu
> Labels: patch
> Fix For: 0.94.3
>
> Attachments: HBASE-7251.patch
>
>
> Background:
> A smart guy in the company want to read data from the HBASE in batch, the code like the following:(just demonstrate, not runnable):
> List<Get> gets = new ArrayList<Get>();
> for(int i =0; i < n; ++i){
> gets.add("some row key here");
> if (i % 10000 == 0){
> Results[] results = htable.get(gets);
> //process results here. so delete some code
> }
> }
> Yes, you know that, this guy forgot "gets.clear()" after each "htable.get()" operation in his code.
> One region server becomes very slow, and crashed after 30mins becauseof OOM, but we got 15GB log file.
> there are flood logs as following:
> ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@49540d8d), rpc version=1, client version=29, methodsFingerPrint=-56040613 from 10.1.1.1:57933 after 3980 ms, since caller disconnected
> at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3468)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3425)
> at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3449)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4198)
> at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4171)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1993)
> at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3410)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
> at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1409)
> Fix:
> Server is stop "get" but cannot exit from the "for" loop, so write flood logs here.
> My patch just log one exception log instead of flood logs.
> Importantly, server stop processing immediately if client timeout or disconnect.
> Test:
> I used this guy's wrong code read data( NO "gets.clear()" ) from the HBASE, when it becomes very slow to get results, I pressed ctrl+C, then there is only ONE CallerDisconnectedException exception log and the server stop reading immediately, LOG generate the last log entry:
> WARN org.apache.hadoop.ipc.HBaseServer: IPC Server handler 1 on 60020 caught a ClosedChannelException, this means that the server was processing a request but the client went away. The error message was: null
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira