You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sachin Jain <sa...@gmail.com> on 2016/11/01 05:10:56 UTC

Default value of caching in Scanner

Hi,

I am using HBase v1.1.2. I have few questions regarding full table scan:-

1. When we instantiate a Scanner and do not set any caching on it. What is
the value it picks by default.
- By looking at the code, I have found the following:

From documentation on the top in Scan.java class

* To modify scanner caching for just this scan, use {@link
#setCaching(int) setCaching}.
* If caching is NOT set, we will use the caching value of the hosting
{@link Table}.

And

/**
 * Set the number of rows for caching that will be passed to scanners.
 * If not set, the Configuration setting {@link
HConstants#HBASE_CLIENT_SCANNER_CACHING} will
 * apply.
 * Higher caching values will enable faster scanners but will use more memory.
 * @param caching the number of rows for caching
 */
public Scan setCaching(int caching) {
  this.caching = caching;
  return this;
}

And, default value in HConstants file is

public static final String HBASE_CLIENT_SCANNER_CACHING =
"hbase.client.scanner.caching";
public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING = 2147483647;


Does that mean the default value viz number of records read per scan is
2147483647.
Can someone please clarify this ?

2. Another question is: I assume we have to set the caching value higher so
that we can reduce the number of RPC calls between client and region server.
So if we increase the caching value, should we also increase the RPC
timeout and scannerTimeout values otherwise we may reach that threshold for
the new cache value.

Thanks
-Sachin

Re: Default value of caching in Scanner

Posted by Sachin Jain <sa...@gmail.com>.
Thanks Yu!! This is very helpful.

On Tue, Nov 1, 2016 at 2:45 PM, Yu Li <ca...@gmail.com> wrote:

> A brief answer yes, by default the caching size is Integer.MAX_VALUE now
> and it's a big difference from 0.98. This is changed by HBASE-11544 and you
> could find below statement on http://hbase.apache.org/book.html:
>
> hbase.client.scanner.caching
> Description
>
> Number of rows that we try to fetch when calling next on a scanner if it is
> not served from (local, client) memory. This configuration works together
> with hbase.client.scanner.max.result.size to try and use the network
> efficiently. The default value is Integer.MAX_VALUE by default so that the
> network will fill the chunk size defined by
> hbase.client.scanner.max.result.size rather than be limited by a
> particular
> number of rows since the size of rows varies table to table. If you know
> ahead of time that you will not require more than a certain number of rows
> from a scan, this configuration should be set to that row limit via
> Scan#setCaching. Higher caching values will enable faster scanners but will
> eat up more memory and some calls of next may take longer and longer times
> when the cache is empty. Do not set this value such that the time between
> invocations is greater than the scanner timeout; i.e.
> hbase.client.scanner.timeout.period
> Default
>
> 2147483647
>
> And user will be able to control the time limit of each call from client
> configuration after HBASE-15593, but only after 1.3.0 get released (sorry
> but for all existing release we could only control this by server side
> configuration, say half of hbase.client.scanner.timeout.period)
>
> We're discussing about this in
> https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get
> more details there.
>
> Small world, isn't it? (Smile)
>
> Best Regards,
> Yu
>
> On 1 November 2016 at 13:10, Sachin Jain <sa...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using HBase v1.1.2. I have few questions regarding full table scan:-
> >
> > 1. When we instantiate a Scanner and do not set any caching on it. What
> is
> > the value it picks by default.
> > - By looking at the code, I have found the following:
> >
> > From documentation on the top in Scan.java class
> >
> > * To modify scanner caching for just this scan, use {@link
> > #setCaching(int) setCaching}.
> > * If caching is NOT set, we will use the caching value of the hosting
> > {@link Table}.
> >
> > And
> >
> > /**
> >  * Set the number of rows for caching that will be passed to scanners.
> >  * If not set, the Configuration setting {@link
> > HConstants#HBASE_CLIENT_SCANNER_CACHING} will
> >  * apply.
> >  * Higher caching values will enable faster scanners but will use more
> > memory.
> >  * @param caching the number of rows for caching
> >  */
> > public Scan setCaching(int caching) {
> >   this.caching = caching;
> >   return this;
> > }
> >
> > And, default value in HConstants file is
> >
> > public static final String HBASE_CLIENT_SCANNER_CACHING =
> > "hbase.client.scanner.caching";
> > public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING =
> 2147483647;
> >
> >
> > Does that mean the default value viz number of records read per scan is
> > 2147483647.
> > Can someone please clarify this ?
> >
> > 2. Another question is: I assume we have to set the caching value higher
> so
> > that we can reduce the number of RPC calls between client and region
> > server.
> > So if we increase the caching value, should we also increase the RPC
> > timeout and scannerTimeout values otherwise we may reach that threshold
> for
> > the new cache value.
> >
> > Thanks
> > -Sachin
> >
>

Re: Default value of caching in Scanner

Posted by Yu Li <ca...@gmail.com>.
A brief answer yes, by default the caching size is Integer.MAX_VALUE now
and it's a big difference from 0.98. This is changed by HBASE-11544 and you
could find below statement on http://hbase.apache.org/book.html:

hbase.client.scanner.caching
Description

Number of rows that we try to fetch when calling next on a scanner if it is
not served from (local, client) memory. This configuration works together
with hbase.client.scanner.max.result.size to try and use the network
efficiently. The default value is Integer.MAX_VALUE by default so that the
network will fill the chunk size defined by
hbase.client.scanner.max.result.size rather than be limited by a particular
number of rows since the size of rows varies table to table. If you know
ahead of time that you will not require more than a certain number of rows
from a scan, this configuration should be set to that row limit via
Scan#setCaching. Higher caching values will enable faster scanners but will
eat up more memory and some calls of next may take longer and longer times
when the cache is empty. Do not set this value such that the time between
invocations is greater than the scanner timeout; i.e.
hbase.client.scanner.timeout.period
Default

2147483647

And user will be able to control the time limit of each call from client
configuration after HBASE-15593, but only after 1.3.0 get released (sorry
but for all existing release we could only control this by server side
configuration, say half of hbase.client.scanner.timeout.period)

We're discussing about this in
https://issues.apache.org/jira/browse/HBASE-16973 recently, you can get
more details there.

Small world, isn't it? (Smile)

Best Regards,
Yu

On 1 November 2016 at 13:10, Sachin Jain <sa...@gmail.com> wrote:

> Hi,
>
> I am using HBase v1.1.2. I have few questions regarding full table scan:-
>
> 1. When we instantiate a Scanner and do not set any caching on it. What is
> the value it picks by default.
> - By looking at the code, I have found the following:
>
> From documentation on the top in Scan.java class
>
> * To modify scanner caching for just this scan, use {@link
> #setCaching(int) setCaching}.
> * If caching is NOT set, we will use the caching value of the hosting
> {@link Table}.
>
> And
>
> /**
>  * Set the number of rows for caching that will be passed to scanners.
>  * If not set, the Configuration setting {@link
> HConstants#HBASE_CLIENT_SCANNER_CACHING} will
>  * apply.
>  * Higher caching values will enable faster scanners but will use more
> memory.
>  * @param caching the number of rows for caching
>  */
> public Scan setCaching(int caching) {
>   this.caching = caching;
>   return this;
> }
>
> And, default value in HConstants file is
>
> public static final String HBASE_CLIENT_SCANNER_CACHING =
> "hbase.client.scanner.caching";
> public static final int DEFAULT_HBASE_CLIENT_SCANNER_CACHING = 2147483647;
>
>
> Does that mean the default value viz number of records read per scan is
> 2147483647.
> Can someone please clarify this ?
>
> 2. Another question is: I assume we have to set the caching value higher so
> that we can reduce the number of RPC calls between client and region
> server.
> So if we increase the caching value, should we also increase the RPC
> timeout and scannerTimeout values otherwise we may reach that threshold for
> the new cache value.
>
> Thanks
> -Sachin
>