You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Daniel Cryans <jd...@apache.org> on 2009/04/17 03:40:15 UTC

Tip when scanning and spending a lot of time on each row

Hey list,

Just a small tip for those who uses the scanners in HBase and that
their processing time takes more than 2-3 seconds per row : lower the
hbase.client.scanner.caching. When I wrote that feature, my tests
showed my that a value of 30 gives the best speed VS memory
consumption. 80% of the time, that's what you need. In the case I
first described, you will very likely hit scanner timeouts (or
unknown). Why? Some simple maths :

Default lease time : 60 secs
Example row processing time : 3 secs
Scanner prefeching value : 30

That means that you will query 30 rows in a single batch in the first
next(), then you will take the 29 others directly from the client
cache, then you will re-query a region server for 30 more. Since 3*30
= 90 and that's > 60, you get a scanner timeout. In one case recently,
it was taking me more than 2 minutes per row (rss crawling) so
timeouts were inevitable.

You can set this value in hbase-site, a HBaseConfiguration object or
using HTable.setScannerCaching

J-D

Re: Tip when scanning and spending a lot of time on each row

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Lars,

Good idea. It's now on the troubleshooting page and
hbase.client.scanner.caching is set to 1 by default in trunk.

J-D

On Sun, Apr 19, 2009 at 3:52 PM, Lars George <la...@worldlingo.com> wrote:
> Hi J-D,
>
> This is really important news it seems as we had quite a few of those lately
> being reported with no apparent reason. Could you please add this to the
> Wiki troubleshooting (or similar) page?
>
> Regards,
> Lars
>
>
> Jean-Daniel Cryans wrote:
>>
>> Hey list,
>>
>> Just a small tip for those who uses the scanners in HBase and that
>> their processing time takes more than 2-3 seconds per row : lower the
>> hbase.client.scanner.caching. When I wrote that feature, my tests
>> showed my that a value of 30 gives the best speed VS memory
>> consumption. 80% of the time, that's what you need. In the case I
>> first described, you will very likely hit scanner timeouts (or
>> unknown). Why? Some simple maths :
>>
>> Default lease time : 60 secs
>> Example row processing time : 3 secs
>> Scanner prefeching value : 30
>>
>> That means that you will query 30 rows in a single batch in the first
>> next(), then you will take the 29 others directly from the client
>> cache, then you will re-query a region server for 30 more. Since 3*30
>> = 90 and that's > 60, you get a scanner timeout. In one case recently,
>> it was taking me more than 2 minutes per row (rss crawling) so
>> timeouts were inevitable.
>>
>> You can set this value in hbase-site, a HBaseConfiguration object or
>> using HTable.setScannerCaching
>>
>> J-D
>>
>>
>

Re: Tip when scanning and spending a lot of time on each row

Posted by Lars George <la...@worldlingo.com>.
Hi J-D,

This is really important news it seems as we had quite a few of those 
lately being reported with no apparent reason. Could you please add this 
to the Wiki troubleshooting (or similar) page?

Regards,
Lars


Jean-Daniel Cryans wrote:
> Hey list,
>
> Just a small tip for those who uses the scanners in HBase and that
> their processing time takes more than 2-3 seconds per row : lower the
> hbase.client.scanner.caching. When I wrote that feature, my tests
> showed my that a value of 30 gives the best speed VS memory
> consumption. 80% of the time, that's what you need. In the case I
> first described, you will very likely hit scanner timeouts (or
> unknown). Why? Some simple maths :
>
> Default lease time : 60 secs
> Example row processing time : 3 secs
> Scanner prefeching value : 30
>
> That means that you will query 30 rows in a single batch in the first
> next(), then you will take the 29 others directly from the client
> cache, then you will re-query a region server for 30 more. Since 3*30
> = 90 and that's > 60, you get a scanner timeout. In one case recently,
> it was taking me more than 2 minutes per row (rss crawling) so
> timeouts were inevitable.
>
> You can set this value in hbase-site, a HBaseConfiguration object or
> using HTable.setScannerCaching
>
> J-D
>
>