You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sam Seigal <se...@yahoo.com> on 2011/11/17 22:44:08 UTC

block caching

I have a table that I only use for generating indexes. It rarely will
have random reads, but will have M/R jobs running against it
constantly for generating indexes. Even the index table, random reads
will be rare. It will mostly be used for scanning blocks of data.


According to HBase The Definitive Guide

"As HBase reads entire blocks of data for efficient IO usage it
retains these blocks in an in-memory cache, so that subsequent reads
do not need any disk operation. The default of true enables the block
cache for every read operation. But if your use-case only ever has
sequential reads on a particular column family it is advisable to
disable it from polluting the block cache by setting the block cache
enabled flag to false. "

"There are other options you can use to influence how the block cache
is used, for example during a scan operation. This is useful during
full table scans so that you do not cause a major churn on the cache.
See the section called “Configuration” for more information about this
feature."

"Scan instances can be set to use the block cache in the region server
via the setCacheBlocks() method. For scans used with MapReduce jobs,
this should be false. For frequently accessed rows, it is advisable to
use the block cache."


What is the reasoning behind the above ?  Why is using a block cache
for M/R jobs not a good idea if it is doing full table scans ?

Re: block caching

Posted by Jean-Daniel Cryans <jd...@apache.org>.
And it will probably evict everyone else that was already present.
Hello latency.

J-D

On Thu, Nov 17, 2011 at 2:08 PM, lars hofhansl <lh...@yahoo.com> wrote:
> Hi Sam,
> The idea is that the entire result of the scan will not fit into the cache if the scan scans a "reasonable" number of cells, and hence it unlikely that another scan will hit cached blocks before they get evicted, especially when using an LRU cache.
>
> -- Lars
>
>
> ----- Original Message -----
> From: Sam Seigal <se...@yahoo.com>
> To: user@hbase.apache.org
> Cc:
> Sent: Thursday, November 17, 2011 1:44 PM
> Subject: block caching
>
> I have a table that I only use for generating indexes. It rarely will
> have random reads, but will have M/R jobs running against it
> constantly for generating indexes. Even the index table, random reads
> will be rare. It will mostly be used for scanning blocks of data.
>
>
> According to HBase The Definitive Guide
>
> "As HBase reads entire blocks of data for efficient IO usage it
> retains these blocks in an in-memory cache, so that subsequent reads
> do not need any disk operation. The default of true enables the block
> cache for every read operation. But if your use-case only ever has
> sequential reads on a particular column family it is advisable to
> disable it from polluting the block cache by setting the block cache
> enabled flag to false. "
>
> "There are other options you can use to influence how the block cache
> is used, for example during a scan operation. This is useful during
> full table scans so that you do not cause a major churn on the cache.
> See the section called “Configuration” for more information about this
> feature."
>
> "Scan instances can be set to use the block cache in the region server
> via the setCacheBlocks() method. For scans used with MapReduce jobs,
> this should be false. For frequently accessed rows, it is advisable to
> use the block cache."
>
>
> What is the reasoning behind the above ?  Why is using a block cache
> for M/R jobs not a good idea if it is doing full table scans ?
>
>

Re: block caching

Posted by lars hofhansl <lh...@yahoo.com>.
Hi Sam,
The idea is that the entire result of the scan will not fit into the cache if the scan scans a "reasonable" number of cells, and hence it unlikely that another scan will hit cached blocks before they get evicted, especially when using an LRU cache.

-- Lars


----- Original Message -----
From: Sam Seigal <se...@yahoo.com>
To: user@hbase.apache.org
Cc: 
Sent: Thursday, November 17, 2011 1:44 PM
Subject: block caching

I have a table that I only use for generating indexes. It rarely will
have random reads, but will have M/R jobs running against it
constantly for generating indexes. Even the index table, random reads
will be rare. It will mostly be used for scanning blocks of data.


According to HBase The Definitive Guide

"As HBase reads entire blocks of data for efficient IO usage it
retains these blocks in an in-memory cache, so that subsequent reads
do not need any disk operation. The default of true enables the block
cache for every read operation. But if your use-case only ever has
sequential reads on a particular column family it is advisable to
disable it from polluting the block cache by setting the block cache
enabled flag to false. "

"There are other options you can use to influence how the block cache
is used, for example during a scan operation. This is useful during
full table scans so that you do not cause a major churn on the cache.
See the section called “Configuration” for more information about this
feature."

"Scan instances can be set to use the block cache in the region server
via the setCacheBlocks() method. For scans used with MapReduce jobs,
this should be false. For frequently accessed rows, it is advisable to
use the block cache."


What is the reasoning behind the above ?  Why is using a block cache
for M/R jobs not a good idea if it is doing full table scans ?