You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Matt K <ma...@gmail.com> on 2014/06/02 21:55:25 UTC

hbase block-cache scan.setCaching(false) not being respected

Hi all,

We are running a number of Map/Reduce jobs on top of HBase. We are not
using HBase for any of its realtime capabilities, only for
batch-processing. So we aren't doing lookups, just scans.

Each one of our jobs has *scan.setCaching(false)* to turn off
block-caching, since each block will only be accessed once.

We recently started using Cloudera Manager, and I’m seeing something that
doesn’t add up. See image below. It’s clear from the graphs that Block
Cache is being used currently, and blocks are being cached and evicted.

We do have *hfile.block.cache.size* set to 0.4 (default), but my
understanding is that the jobs setting scan.setCaching(false) should
override this. Since it’s set in every job, there should be no blocks being
cached.

Can anyone help me understand what we’re seeing?

Thanks,

-Matt

[image: Inline image 1]

Re: hbase block-cache scan.setCaching(false) not being respected

Posted by Bryan Beaudreault <bb...@hubspot.com>.

The Block Cache is used for more than just the scanner caching.
 Additionally, *hfile.block.cache.size *is a server-side config, while
scan.setCaching(false) is on an RPC-level.  So regardless of your
setCaching value the RegionServers will continue to allocate memory to the
block cache.

Check out
http://hbase.apache.org/book/regionserver.arch.html#block.cache.usage for
more details, specifically it lists other residents of the block cache.

On Mon, Jun 2, 2014 at 3:55 PM, Matt K <ma...@gmail.com> wrote:

> Hi all,
>
> We are running a number of Map/Reduce jobs on top of HBase. We are not
> using HBase for any of its realtime capabilities, only for
> batch-processing. So we aren't doing lookups, just scans.
>
> Each one of our jobs has *scan.setCaching(false)* to turn off
> block-caching, since each block will only be accessed once.
>
> We recently started using Cloudera Manager, and I’m seeing something that
> doesn’t add up. See image below. It’s clear from the graphs that Block
> Cache is being used currently, and blocks are being cached and evicted.
>
> We do have *hfile.block.cache.size* set to 0.4 (default), but my
> understanding is that the jobs setting scan.setCaching(false) should
> override this. Since it’s set in every job, there should be no blocks being
> cached.
>
> Can anyone help me understand what we’re seeing?
>
> Thanks,
>
> -Matt
>
> [image: Inline image 1]
>

Re: hbase block-cache scan.setCaching(false) not being respected

Posted by Matt K <ma...@gmail.com>.

Hi, thanks for responses.

Ted - when I said "scan.setCaching", I meant "scan.setCacheBlocks(false)".
That's what I get for not copying/pasting directly from code :)

I added a link to the graphs here:
https://drive.google.com/file/d/0B3ZQ0nMNMFxCOHZNZVFsWEhCOUU/edit?usp=sharing

Bryan - I believe you're right, but wanted to confirm.

Thanks,
-Matt


On Mon, Jun 2, 2014 at 4:09 PM, Ted Yu <yu...@gmail.com> wrote:

> Have you added the following when passing Scan to your job ?
>
> scan.setCacheBlocks(false);
>
> BTW image didn't go through.
> Consider putting image on third-party site.
>
> On Mon, Jun 2, 2014 at 12:55 PM, Matt K <ma...@gmail.com> wrote:
>
> > Hi all,
> >
> > We are running a number of Map/Reduce jobs on top of HBase. We are not
> > using HBase for any of its realtime capabilities, only for
> > batch-processing. So we aren't doing lookups, just scans.
> >
> > Each one of our jobs has *scan.setCaching(false)* to turn off
> > block-caching, since each block will only be accessed once.
> >
> > We recently started using Cloudera Manager, and I’m seeing something that
> > doesn’t add up. See image below. It’s clear from the graphs that Block
> > Cache is being used currently, and blocks are being cached and evicted.
> >
> > We do have *hfile.block.cache.size* set to 0.4 (default), but my
> > understanding is that the jobs setting scan.setCaching(false) should
> > override this. Since it’s set in every job, there should be no blocks
> being
> > cached.
> >
> > Can anyone help me understand what we’re seeing?
> >
> > Thanks,
> >
> > -Matt
> >
> > [image: Inline image 1]
> >
>



-- 
www.calcmachine.com - easy online calculator.

Re: hbase block-cache scan.setCaching(false) not being respected

Posted by Ted Yu <yu...@gmail.com>.

Have you added the following when passing Scan to your job ?

scan.setCacheBlocks(false);

BTW image didn't go through.
Consider putting image on third-party site.

On Mon, Jun 2, 2014 at 12:55 PM, Matt K <ma...@gmail.com> wrote:

> Hi all,
>
> We are running a number of Map/Reduce jobs on top of HBase. We are not
> using HBase for any of its realtime capabilities, only for
> batch-processing. So we aren't doing lookups, just scans.
>
> Each one of our jobs has *scan.setCaching(false)* to turn off
> block-caching, since each block will only be accessed once.
>
> We recently started using Cloudera Manager, and I’m seeing something that
> doesn’t add up. See image below. It’s clear from the graphs that Block
> Cache is being used currently, and blocks are being cached and evicted.
>
> We do have *hfile.block.cache.size* set to 0.4 (default), but my
> understanding is that the jobs setting scan.setCaching(false) should
> override this. Since it’s set in every job, there should be no blocks being
> cached.
>
> Can anyone help me understand what we’re seeing?
>
> Thanks,
>
> -Matt
>
> [image: Inline image 1]
>