You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Yu Li (JIRA)" <ji...@apache.org> on 2016/10/31 04:24:58 UTC
[jira] [Created] (HBASE-16973) Revisiting default value for
hbase.client.scanner.caching
Yu Li created HBASE-16973:
-----------------------------
Summary: Revisiting default value for hbase.client.scanner.caching
Key: HBASE-16973
URL: https://issues.apache.org/jira/browse/HBASE-16973
Project: HBase
Issue Type: Bug
Reporter: Yu Li
Assignee: Yu Li
We are observing below logs for a long-running scan:
{noformat}
2016-10-30 08:51:41,692 WARN [B.defaultRpcServer.handler=50,queue=12,port=16020] ipc.RpcServer:
(responseTooSlow-LongProcessTime): {"processingtimems":24329,
"call":"Scan(org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ScanRequest)",
"client":"11.251.157.108:50415","scandetails":"table: ae_product_image region: ae_product_image,494:
,1476872321454.33171a04a683c4404717c43ea4eb8978.","param":"scanner_id: 5333521 number_of_rows: 2147483647
close_scanner: false next_call_seq: 8 client_handles_partials: true client_handles_heartbeats: true",
"starttimems":1477788677363,"queuetimems":0,"class":"HRegionServer","responsesize":818,"method":"Scan"}
{noformat}
From which we found the "number_of_rows" is as big as {{Integer.MAX_VALUE}}
And we also observed a long filter list on the customized scan. After checking application code we confirmed that there's no {{Scan.setCaching}} or {{hbase.client.scanner.caching}} setting on client side, so it turns out using the default value the caching for Scan will be Integer.MAX_VALUE, which is really a big surprise.
After checking code and commit history, I found it's HBASE-11544 which changes {{HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING}} from 100 to Integer.MAX_VALUE, and from the release note there I could see below notation:
{noformat}
Scan caching default has been changed to Integer.Max_Value
This value works together with the new maxResultSize value from HBASE-12976 (defaults to 2MB)
Results returned from server on basis of size rather than number of rows
Provides better use of network since row size varies amongst tables
{noformat}
And I'm afraid this lacks of consideration of the case of scan with filters, which may involve many rows but only return with a small result.
What's more, we still have below comment/code in {{Scan.java}}
{code}
/*
* -1 means no caching
*/
private int caching = -1;
{code}
But actually the implementation does not follow (instead of no caching, we are caching {{Integer.MAX_VALUE}}...).
So here I'd like to bring up two points:
1. Change back the default value of HConstants.DEFAULT_HBASE_CLIENT_SCANNER_CACHING to some small value like 128
2. Reenforce the semantic of "no caching"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)