You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by gortiz <go...@pragsis.com> on 2014/04/09 08:24:16 UTC

BlockCache for large scans.

I've been reading the book definitive guide and hbase in action a 
little. I found this question from Cloudera that I'm not sure after 
looking some benchmarks and documentations from HBase. Could someone 
explain me a little about? . I think that when you do a large scan you 
should disable the blockcache becuase the blocks are going to swat a 
lot, so you didn't get anything from cache, I guess you should be 
penalized since you're spending memory, calling GC and CPU with this task.

*You want to do a full table scan on your data. You decide to disable 
block caching to see if this**
**improves scan performance. Will disabling block caching improve scan 
performance?*

A.
No. Disabling block caching does not improve scan performance.

B.
Yes. When you disable block caching, you free up that memory for other 
operations. With a full
table scan, you cannot take advantage of block caching anyway because 
your entire table won't fit
into cache.

C.
No. If you disable block caching, HBase must read each block index from 
disk for each scan,
thereby decreasing scan performance.

D.
Yes. When you disable block caching, you free up memory for MemStore, 
which improves,
scan performance.


Re: BlockCache for large scans.

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Ok. I see it in TableInputFormat:
        // false by default, full table scans generate too much BC churn
        scan.setCacheBlocks((conf.getBoolean(SCAN_CACHEBLOCKS, false)));

So ne need to to it too in initTableMapperJob I guess...

Thanks,

JM


2014-04-11 16:53 GMT-04:00 lars hofhansl <la...@apache.org>:

> Yep. For all of our M/R jobs we do indeed disable the caching of blocks.
> In fact TableInputFormat sets cache blocks to false currently anyway.
>
> -- Lars
>
>   ------------------------------
>  *From:* Jean-Marc Spaggiari <je...@spaggiari.org>
> *To:* user <us...@hbase.apache.org>; lars hofhansl <la...@apache.org>
> *Sent:* Friday, April 11, 2014 6:54 AM
>
> *Subject:* Re: BlockCache for large scans.
>
> Hi Lars,
>
> So just to continue on that, when we are do MR jobs with HBase, this
> should be disable too since we will read the entire table, right? Is this
> done by default or it's something the client should setup manually? On my
> own code I setup this manually. I looked into
> TableMapReduceUtil.initTableMapperJob and there is nothing there. Should we
> not just set CacheBlocks to false in initTableMapperJob directly?
>
> JM
>
>
> 2014-04-10 14:50 GMT-04:00 lars hofhansl <la...@apache.org>:
>
> Generally (and this is database lore not just HBase) if you use an LRU
> type cache, your working set does not fit into the cache, and you
> repeatedly scan this working set you have created the worst case scenario.
> The database does all the work caching the blocks, and subsequent scans
> will need block that were just evicted towards end of the previous scan.
>
> For large scans where it is likely that the entire scan does not fit into
> the block cache, you should absolutely disable caching the blocks traversed
> for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not
> affected, they are cached regardless.
>
> -- Lars
>
>
>
> ________________________________
>  From: gortiz <go...@pragsis.com>
> To: user@hbase.apache.org
> Sent: Wednesday, April 9, 2014 11:37 PM
> Subject: Re: BlockCache for large scans.
>
>
> But, I think there's a direct relation between improving performance in
> large scan and memory for memstore. Until I understand, memstore just
> work as cache to write operations.
>
>
> On 09/04/14 23:44, Ted Yu wrote:
> > Didn't quite get what you mean, Asaf.
> >
> > If you're talking about HBASE-5349, please read release note of
> HBASE-5349.
> >
> > By default, memstore min/max range is initialized to memstore percent:
> >
> >      globalMemStorePercentMinRange = conf.getFloat(
> > MEMSTORE_SIZE_MIN_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> >      globalMemStorePercentMaxRange = conf.getFloat(
> > MEMSTORE_SIZE_MAX_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> > Cheers
> >
> >
> > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com>
> wrote:
> >
> >> The Jira says it's enabled by auto. Is there an official explaining this
> >> feature?
> >>
> >> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Please take a look at http://www.n10k.com/blog/blockcache-101/
> >>>
> >>> For D, hbase.regionserver.global.memstore.size is specified in terms of
> >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> >>> global memstore and block cache sizes based on workload'
> >>>
> >>>
> >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com
> <javascript:;>>
> >>> wrote:
> >>>
> >>>> I've been reading the book definitive guide and hbase in action a
> >> little.
> >>>> I found this question from Cloudera that I'm not sure after looking
> >> some
> >>>> benchmarks and documentations from HBase. Could someone explain me a
> >>> little
> >>>> about? . I think that when you do a large scan you should disable the
> >>>> blockcache becuase the blocks are going to swat a lot, so you didn't
> >> get
> >>>> anything from cache, I guess you should be penalized since you're
> >>> spending
> >>>> memory, calling GC and CPU with this task.
> >>>>
> >>>> *You want to do a full table scan on your data. You decide to disable
> >>>> block caching to see if this**
> >>>> **improves scan performance. Will disabling block caching improve scan
> >>>> performance?*
> >>>>
> >>>> A.
> >>>> No. Disabling block caching does not improve scan performance.
> >>>>
> >>>> B.
> >>>> Yes. When you disable block caching, you free up that memory for other
> >>>> operations. With a full
> >>>> table scan, you cannot take advantage of block caching anyway because
> >>> your
> >>>> entire table won't fit
> >>>> into cache.
> >>>>
> >>>> C.
> >>>> No. If you disable block caching, HBase must read each block index
> from
> >>>> disk for each scan,
> >>>> thereby decreasing scan performance.
> >>>>
> >>>> D.
> >>>> Yes. When you disable block caching, you free up memory for MemStore,
> >>>> which improves,
> >>>> scan performance.
> >>>>
> >>>>
>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>
>
>
>
>

Re: BlockCache for large scans.

Posted by lars hofhansl <la...@apache.org>.
Yep. For all of our M/R jobs we do indeed disable the caching of blocks.
In fact TableInputFormat sets cache blocks to false currently anyway.


-- Lars



________________________________
 From: Jean-Marc Spaggiari <je...@spaggiari.org>
To: user <us...@hbase.apache.org>; lars hofhansl <la...@apache.org> 
Sent: Friday, April 11, 2014 6:54 AM
Subject: Re: BlockCache for large scans.
 


Hi Lars,

So just to continue on that, when we are do MR jobs with HBase, this should be disable too since we will read the entire table, right? Is this done by default or it's something the client should setup manually? On my own code I setup this manually. I looked into TableMapReduceUtil.initTableMapperJob and there is nothing there. Should we not just set CacheBlocks to false in initTableMapperJob directly?

JM




2014-04-10 14:50 GMT-04:00 lars hofhansl <la...@apache.org>:

Generally (and this is database lore not just HBase) if you use an LRU type cache, your working set does not fit into the cache, and you repeatedly scan this working set you have created the worst case scenario. The database does all the work caching the blocks, and subsequent scans will need block that were just evicted towards end of the previous scan.
>
>For large scans where it is likely that the entire scan does not fit into the block cache, you should absolutely disable caching the blocks traversed for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not affected, they are cached regardless.
>
>-- Lars
>
>
>
>________________________________
> From: gortiz <go...@pragsis.com>
>To: user@hbase.apache.org
>Sent: Wednesday, April 9, 2014 11:37 PM
>Subject: Re: BlockCache for large scans.
>
>
>
>But, I think there's a direct relation between improving performance in
>large scan and memory for memstore. Until I understand, memstore just
>work as cache to write operations.
>
>
>On 09/04/14 23:44, Ted Yu wrote:
>> Didn't quite get what you mean, Asaf.
>>
>> If you're talking about HBASE-5349, please read release note of HBASE-5349.
>>
>> By default, memstore min/max range is initialized to memstore percent:
>>
>>      globalMemStorePercentMinRange = conf.getFloat(
>> MEMSTORE_SIZE_MIN_RANGE_KEY,
>>
>>          globalMemStorePercent);
>>
>>      globalMemStorePercentMaxRange = conf.getFloat(
>> MEMSTORE_SIZE_MAX_RANGE_KEY,
>>
>>          globalMemStorePercent);
>>
>> Cheers
>>
>>
>> On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com> wrote:
>>
>>> The Jira says it's enabled by auto. Is there an official explaining this
>>> feature?
>>>
>>> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
>>>
>>>> Please take a look at http://www.n10k.com/blog/blockcache-101/
>>>>
>>>> For D, hbase.regionserver.global.memstore.size is specified in terms of
>>>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
>>>> global memstore and block cache sizes based on workload'
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com<javascript:;>>
>>>> wrote:
>>>>
>>>>> I've been reading the book definitive guide and hbase in action a
>>> little.
>>>>> I found this question from Cloudera that I'm not sure after looking
>>> some
>>>>> benchmarks and documentations from HBase. Could someone explain me a
>>>> little
>>>>> about? . I think that when you do a large scan you should disable the
>>>>> blockcache becuase the blocks are going to swat a lot, so you didn't
>>> get
>>>>> anything from cache, I guess you should be penalized since you're
>>>> spending
>>>>> memory, calling GC and CPU with this task.
>>>>>
>>>>> *You want to do a full table scan on your data. You decide to disable
>>>>> block caching to see if this**
>>>>> **improves scan performance. Will disabling block caching improve scan
>>>>> performance?*
>>>>>
>>>>> A.
>>>>> No. Disabling block caching does not improve scan performance.
>>>>>
>>>>> B.
>>>>> Yes. When you disable block caching, you free up that memory for other
>>>>> operations. With a full
>>>>> table scan, you cannot take advantage of block caching anyway because
>>>> your
>>>>> entire table won't fit
>>>>> into cache.
>>>>>
>>>>> C.
>>>>> No. If you disable block caching, HBase must read each block index from
>>>>> disk for each scan,
>>>>> thereby decreasing scan performance.
>>>>>
>>>>> D.
>>>>> Yes. When you disable block caching, you free up memory for MemStore,
>>>>> which improves,
>>>>> scan performance.
>>>>>
>>>>>
>
>
>--
>*Guillermo Ortiz*
>/Big Data Developer/
>
>Telf.: +34 917 680 490
>Fax: +34 913 833 301
>C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
>_http://www.bidoop.es_

Re: BlockCache for large scans.

Posted by Stack <st...@duboce.net>.
On Fri, Apr 11, 2014 at 6:54 AM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi Lars,
>
> So just to continue on that, when we are do MR jobs with HBase, this should
> be disable too since we will read the entire table, right? Is this done by
> default or it's something the client should setup manually? On my own code
> I setup this manually. I looked into TableMapReduceUtil.initTableMapperJob
> and there is nothing there. Should we not just set CacheBlocks to false in
> initTableMapperJob directly?
>

Yes.  Sounds right.
St.Ack

Re: BlockCache for large scans.

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Lars,

So just to continue on that, when we are do MR jobs with HBase, this should
be disable too since we will read the entire table, right? Is this done by
default or it's something the client should setup manually? On my own code
I setup this manually. I looked into TableMapReduceUtil.initTableMapperJob
and there is nothing there. Should we not just set CacheBlocks to false in
initTableMapperJob directly?

JM


2014-04-10 14:50 GMT-04:00 lars hofhansl <la...@apache.org>:

> Generally (and this is database lore not just HBase) if you use an LRU
> type cache, your working set does not fit into the cache, and you
> repeatedly scan this working set you have created the worst case scenario.
> The database does all the work caching the blocks, and subsequent scans
> will need block that were just evicted towards end of the previous scan.
>
> For large scans where it is likely that the entire scan does not fit into
> the block cache, you should absolutely disable caching the blocks traversed
> for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not
> affected, they are cached regardless.
>
> -- Lars
>
>
>
> ________________________________
>  From: gortiz <go...@pragsis.com>
> To: user@hbase.apache.org
> Sent: Wednesday, April 9, 2014 11:37 PM
> Subject: Re: BlockCache for large scans.
>
>
> But, I think there's a direct relation between improving performance in
> large scan and memory for memstore. Until I understand, memstore just
> work as cache to write operations.
>
>
> On 09/04/14 23:44, Ted Yu wrote:
> > Didn't quite get what you mean, Asaf.
> >
> > If you're talking about HBASE-5349, please read release note of
> HBASE-5349.
> >
> > By default, memstore min/max range is initialized to memstore percent:
> >
> >      globalMemStorePercentMinRange = conf.getFloat(
> > MEMSTORE_SIZE_MIN_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> >      globalMemStorePercentMaxRange = conf.getFloat(
> > MEMSTORE_SIZE_MAX_RANGE_KEY,
> >
> >          globalMemStorePercent);
> >
> > Cheers
> >
> >
> > On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com>
> wrote:
> >
> >> The Jira says it's enabled by auto. Is there an official explaining this
> >> feature?
> >>
> >> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
> >>
> >>> Please take a look at http://www.n10k.com/blog/blockcache-101/
> >>>
> >>> For D, hbase.regionserver.global.memstore.size is specified in terms of
> >>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> >>> global memstore and block cache sizes based on workload'
> >>>
> >>>
> >>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com
> <javascript:;>>
> >>> wrote:
> >>>
> >>>> I've been reading the book definitive guide and hbase in action a
> >> little.
> >>>> I found this question from Cloudera that I'm not sure after looking
> >> some
> >>>> benchmarks and documentations from HBase. Could someone explain me a
> >>> little
> >>>> about? . I think that when you do a large scan you should disable the
> >>>> blockcache becuase the blocks are going to swat a lot, so you didn't
> >> get
> >>>> anything from cache, I guess you should be penalized since you're
> >>> spending
> >>>> memory, calling GC and CPU with this task.
> >>>>
> >>>> *You want to do a full table scan on your data. You decide to disable
> >>>> block caching to see if this**
> >>>> **improves scan performance. Will disabling block caching improve scan
> >>>> performance?*
> >>>>
> >>>> A.
> >>>> No. Disabling block caching does not improve scan performance.
> >>>>
> >>>> B.
> >>>> Yes. When you disable block caching, you free up that memory for other
> >>>> operations. With a full
> >>>> table scan, you cannot take advantage of block caching anyway because
> >>> your
> >>>> entire table won't fit
> >>>> into cache.
> >>>>
> >>>> C.
> >>>> No. If you disable block caching, HBase must read each block index
> from
> >>>> disk for each scan,
> >>>> thereby decreasing scan performance.
> >>>>
> >>>> D.
> >>>> Yes. When you disable block caching, you free up memory for MemStore,
> >>>> which improves,
> >>>> scan performance.
> >>>>
> >>>>
>
>
> --
> *Guillermo Ortiz*
> /Big Data Developer/
>
> Telf.: +34 917 680 490
> Fax: +34 913 833 301
> C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain
>
> _http://www.bidoop.es_
>

Re: BlockCache for large scans.

Posted by lars hofhansl <la...@apache.org>.
Generally (and this is database lore not just HBase) if you use an LRU type cache, your working set does not fit into the cache, and you repeatedly scan this working set you have created the worst case scenario. The database does all the work caching the blocks, and subsequent scans will need block that were just evicted towards end of the previous scan.

For large scans where it is likely that the entire scan does not fit into the block cache, you should absolutely disable caching the blocks traversed for this scan (i.e. scan.setCacheBlocks(false)). Index blocks are not affected, they are cached regardless.

-- Lars



________________________________
 From: gortiz <go...@pragsis.com>
To: user@hbase.apache.org 
Sent: Wednesday, April 9, 2014 11:37 PM
Subject: Re: BlockCache for large scans.
 

But, I think there's a direct relation between improving performance in 
large scan and memory for memstore. Until I understand, memstore just 
work as cache to write operations.


On 09/04/14 23:44, Ted Yu wrote:
> Didn't quite get what you mean, Asaf.
>
> If you're talking about HBASE-5349, please read release note of HBASE-5349.
>
> By default, memstore min/max range is initialized to memstore percent:
>
>      globalMemStorePercentMinRange = conf.getFloat(
> MEMSTORE_SIZE_MIN_RANGE_KEY,
>
>          globalMemStorePercent);
>
>      globalMemStorePercentMaxRange = conf.getFloat(
> MEMSTORE_SIZE_MAX_RANGE_KEY,
>
>          globalMemStorePercent);
>
> Cheers
>
>
> On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com> wrote:
>
>> The Jira says it's enabled by auto. Is there an official explaining this
>> feature?
>>
>> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look at http://www.n10k.com/blog/blockcache-101/
>>>
>>> For D, hbase.regionserver.global.memstore.size is specified in terms of
>>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
>>> global memstore and block cache sizes based on workload'
>>>
>>>
>>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com<javascript:;>>
>>> wrote:
>>>
>>>> I've been reading the book definitive guide and hbase in action a
>> little.
>>>> I found this question from Cloudera that I'm not sure after looking
>> some
>>>> benchmarks and documentations from HBase. Could someone explain me a
>>> little
>>>> about? . I think that when you do a large scan you should disable the
>>>> blockcache becuase the blocks are going to swat a lot, so you didn't
>> get
>>>> anything from cache, I guess you should be penalized since you're
>>> spending
>>>> memory, calling GC and CPU with this task.
>>>>
>>>> *You want to do a full table scan on your data. You decide to disable
>>>> block caching to see if this**
>>>> **improves scan performance. Will disabling block caching improve scan
>>>> performance?*
>>>>
>>>> A.
>>>> No. Disabling block caching does not improve scan performance.
>>>>
>>>> B.
>>>> Yes. When you disable block caching, you free up that memory for other
>>>> operations. With a full
>>>> table scan, you cannot take advantage of block caching anyway because
>>> your
>>>> entire table won't fit
>>>> into cache.
>>>>
>>>> C.
>>>> No. If you disable block caching, HBase must read each block index from
>>>> disk for each scan,
>>>> thereby decreasing scan performance.
>>>>
>>>> D.
>>>> Yes. When you disable block caching, you free up memory for MemStore,
>>>> which improves,
>>>> scan performance.
>>>>
>>>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_

Re: BlockCache for large scans.

Posted by gortiz <go...@pragsis.com>.
But, I think there's a direct relation between improving performance in 
large scan and memory for memstore. Until I understand, memstore just 
work as cache to write operations.

On 09/04/14 23:44, Ted Yu wrote:
> Didn't quite get what you mean, Asaf.
>
> If you're talking about HBASE-5349, please read release note of HBASE-5349.
>
> By default, memstore min/max range is initialized to memstore percent:
>
>      globalMemStorePercentMinRange = conf.getFloat(
> MEMSTORE_SIZE_MIN_RANGE_KEY,
>
>          globalMemStorePercent);
>
>      globalMemStorePercentMaxRange = conf.getFloat(
> MEMSTORE_SIZE_MAX_RANGE_KEY,
>
>          globalMemStorePercent);
>
> Cheers
>
>
> On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com> wrote:
>
>> The Jira says it's enabled by auto. Is there an official explaining this
>> feature?
>>
>> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
>>
>>> Please take a look at http://www.n10k.com/blog/blockcache-101/
>>>
>>> For D, hbase.regionserver.global.memstore.size is specified in terms of
>>> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
>>> global memstore and block cache sizes based on workload'
>>>
>>>
>>> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com<javascript:;>>
>>> wrote:
>>>
>>>> I've been reading the book definitive guide and hbase in action a
>> little.
>>>> I found this question from Cloudera that I'm not sure after looking
>> some
>>>> benchmarks and documentations from HBase. Could someone explain me a
>>> little
>>>> about? . I think that when you do a large scan you should disable the
>>>> blockcache becuase the blocks are going to swat a lot, so you didn't
>> get
>>>> anything from cache, I guess you should be penalized since you're
>>> spending
>>>> memory, calling GC and CPU with this task.
>>>>
>>>> *You want to do a full table scan on your data. You decide to disable
>>>> block caching to see if this**
>>>> **improves scan performance. Will disabling block caching improve scan
>>>> performance?*
>>>>
>>>> A.
>>>> No. Disabling block caching does not improve scan performance.
>>>>
>>>> B.
>>>> Yes. When you disable block caching, you free up that memory for other
>>>> operations. With a full
>>>> table scan, you cannot take advantage of block caching anyway because
>>> your
>>>> entire table won't fit
>>>> into cache.
>>>>
>>>> C.
>>>> No. If you disable block caching, HBase must read each block index from
>>>> disk for each scan,
>>>> thereby decreasing scan performance.
>>>>
>>>> D.
>>>> Yes. When you disable block caching, you free up memory for MemStore,
>>>> which improves,
>>>> scan performance.
>>>>
>>>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Re: BlockCache for large scans.

Posted by Ted Yu <yu...@gmail.com>.
Didn't quite get what you mean, Asaf.

If you're talking about HBASE-5349, please read release note of HBASE-5349.

By default, memstore min/max range is initialized to memstore percent:

    globalMemStorePercentMinRange = conf.getFloat(
MEMSTORE_SIZE_MIN_RANGE_KEY,

        globalMemStorePercent);

    globalMemStorePercentMaxRange = conf.getFloat(
MEMSTORE_SIZE_MAX_RANGE_KEY,

        globalMemStorePercent);

Cheers


On Wed, Apr 9, 2014 at 3:17 PM, Asaf Mesika <as...@gmail.com> wrote:

> The Jira says it's enabled by auto. Is there an official explaining this
> feature?
>
> On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:
>
> > Please take a look at http://www.n10k.com/blog/blockcache-101/
> >
> > For D, hbase.regionserver.global.memstore.size is specified in terms of
> > percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> > global memstore and block cache sizes based on workload'
> >
> >
> > On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com<javascript:;>>
> > wrote:
> >
> > > I've been reading the book definitive guide and hbase in action a
> little.
> > > I found this question from Cloudera that I'm not sure after looking
> some
> > > benchmarks and documentations from HBase. Could someone explain me a
> > little
> > > about? . I think that when you do a large scan you should disable the
> > > blockcache becuase the blocks are going to swat a lot, so you didn't
> get
> > > anything from cache, I guess you should be penalized since you're
> > spending
> > > memory, calling GC and CPU with this task.
> > >
> > > *You want to do a full table scan on your data. You decide to disable
> > > block caching to see if this**
> > > **improves scan performance. Will disabling block caching improve scan
> > > performance?*
> > >
> > > A.
> > > No. Disabling block caching does not improve scan performance.
> > >
> > > B.
> > > Yes. When you disable block caching, you free up that memory for other
> > > operations. With a full
> > > table scan, you cannot take advantage of block caching anyway because
> > your
> > > entire table won't fit
> > > into cache.
> > >
> > > C.
> > > No. If you disable block caching, HBase must read each block index from
> > > disk for each scan,
> > > thereby decreasing scan performance.
> > >
> > > D.
> > > Yes. When you disable block caching, you free up memory for MemStore,
> > > which improves,
> > > scan performance.
> > >
> > >
> >
>

Re: BlockCache for large scans.

Posted by Asaf Mesika <as...@gmail.com>.
The Jira says it's enabled by auto. Is there an official explaining this
feature?

On Wednesday, April 9, 2014, Ted Yu <yu...@gmail.com> wrote:

> Please take a look at http://www.n10k.com/blog/blockcache-101/
>
> For D, hbase.regionserver.global.memstore.size is specified in terms of
> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> global memstore and block cache sizes based on workload'
>
>
> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <gortiz@pragsis.com <javascript:;>>
> wrote:
>
> > I've been reading the book definitive guide and hbase in action a little.
> > I found this question from Cloudera that I'm not sure after looking some
> > benchmarks and documentations from HBase. Could someone explain me a
> little
> > about? . I think that when you do a large scan you should disable the
> > blockcache becuase the blocks are going to swat a lot, so you didn't get
> > anything from cache, I guess you should be penalized since you're
> spending
> > memory, calling GC and CPU with this task.
> >
> > *You want to do a full table scan on your data. You decide to disable
> > block caching to see if this**
> > **improves scan performance. Will disabling block caching improve scan
> > performance?*
> >
> > A.
> > No. Disabling block caching does not improve scan performance.
> >
> > B.
> > Yes. When you disable block caching, you free up that memory for other
> > operations. With a full
> > table scan, you cannot take advantage of block caching anyway because
> your
> > entire table won't fit
> > into cache.
> >
> > C.
> > No. If you disable block caching, HBase must read each block index from
> > disk for each scan,
> > thereby decreasing scan performance.
> >
> > D.
> > Yes. When you disable block caching, you free up memory for MemStore,
> > which improves,
> > scan performance.
> >
> >
>

Re: BlockCache for large scans.

Posted by gortiz <go...@pragsis.com>.
Pretty interested the link, I'll keep it in my favorites.



On 09/04/14 16:07, Ted Yu wrote:
> Please take a look at http://www.n10k.com/blog/blockcache-101/
>
> For D, hbase.regionserver.global.memstore.size is specified in terms of
> percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
> global memstore and block cache sizes based on workload'
>
>
> On Wed, Apr 9, 2014 at 12:24 AM, gortiz <go...@pragsis.com> wrote:
>
>> I've been reading the book definitive guide and hbase in action a little.
>> I found this question from Cloudera that I'm not sure after looking some
>> benchmarks and documentations from HBase. Could someone explain me a little
>> about? . I think that when you do a large scan you should disable the
>> blockcache becuase the blocks are going to swat a lot, so you didn't get
>> anything from cache, I guess you should be penalized since you're spending
>> memory, calling GC and CPU with this task.
>>
>> *You want to do a full table scan on your data. You decide to disable
>> block caching to see if this**
>> **improves scan performance. Will disabling block caching improve scan
>> performance?*
>>
>> A.
>> No. Disabling block caching does not improve scan performance.
>>
>> B.
>> Yes. When you disable block caching, you free up that memory for other
>> operations. With a full
>> table scan, you cannot take advantage of block caching anyway because your
>> entire table won't fit
>> into cache.
>>
>> C.
>> No. If you disable block caching, HBase must read each block index from
>> disk for each scan,
>> thereby decreasing scan performance.
>>
>> D.
>> Yes. When you disable block caching, you free up memory for MemStore,
>> which improves,
>> scan performance.
>>
>>


-- 
*Guillermo Ortiz*
/Big Data Developer/

Telf.: +34 917 680 490
Fax: +34 913 833 301
C/ Manuel Tovar, 49-53 - 28034 Madrid - Spain

_http://www.bidoop.es_


Re: BlockCache for large scans.

Posted by Ted Yu <yu...@gmail.com>.
Please take a look at http://www.n10k.com/blog/blockcache-101/

For D, hbase.regionserver.global.memstore.size is specified in terms of
percentage of heap. Unless you enable HBASE-5349 'Automagically tweak
global memstore and block cache sizes based on workload'


On Wed, Apr 9, 2014 at 12:24 AM, gortiz <go...@pragsis.com> wrote:

> I've been reading the book definitive guide and hbase in action a little.
> I found this question from Cloudera that I'm not sure after looking some
> benchmarks and documentations from HBase. Could someone explain me a little
> about? . I think that when you do a large scan you should disable the
> blockcache becuase the blocks are going to swat a lot, so you didn't get
> anything from cache, I guess you should be penalized since you're spending
> memory, calling GC and CPU with this task.
>
> *You want to do a full table scan on your data. You decide to disable
> block caching to see if this**
> **improves scan performance. Will disabling block caching improve scan
> performance?*
>
> A.
> No. Disabling block caching does not improve scan performance.
>
> B.
> Yes. When you disable block caching, you free up that memory for other
> operations. With a full
> table scan, you cannot take advantage of block caching anyway because your
> entire table won't fit
> into cache.
>
> C.
> No. If you disable block caching, HBase must read each block index from
> disk for each scan,
> thereby decreasing scan performance.
>
> D.
> Yes. When you disable block caching, you free up memory for MemStore,
> which improves,
> scan performance.
>
>