You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Tianying Chang <ty...@gmail.com> on 2014/04/17 01:41:03 UTC

Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

Hi,

We have a use case where some data are mostly random read, so it polluted
cache and caused big GC. It is better to turn off the block cache for those
data. So we are going to call setCacheBlocks(false) for those get(). We
know that the index will be still cached based on below code path, so we
are safe there.  But it is not clear if BloomFilter belong to the level <
searchTreeLevel, and also get cached also.

         // Call HFile's caching block reader API. We always cache index
         // blocks, otherwise we might get terrible performance.
          boolean shouldCache = cacheBlocks || (lookupLevel <
searchTreeLevel);
          BlockType expectedBlockType;
          if (lookupLevel < searchTreeLevel - 1) {
            expectedBlockType = BlockType.INTERMEDIATE_INDEX;
          } else if (lookupLevel == searchTreeLevel - 1) {
            expectedBlockType = BlockType.LEAF_INDEX;
          } else {
            // this also accounts for ENCODED_DATA
            expectedBlockType = BlockType.DATA;
          }

Or I think because BloomFilter is part of Meta data, so it is always cached
on read even when per-family/per-query cacheBlocks is turned off. Am I
right?

Thanks
Tian-Ying

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

Posted by Tianying Chang <ty...@gmail.com>.
Ted, thanks, I am convinced that BLOOM is cached even when block cache
turned off per-family or per-query, because the code in
CompoundBloomFilter.java below. The highlighted "true" made sure the
cacheBlock is on for BLOOM

reader.readBlock(index.getRootBlockOffset(block),
            index.getRootBlockDataSize(block), true, true, false,
            BlockType.BLOOM_CHUNK);


On Thu, Apr 17, 2014 at 3:39 PM, Ted Yu <yu...@gmail.com> wrote:

> Tianying:
> Please take a look at CacheConfig#shouldCacheBlockOnRead() which is called
> by HFileReaderV2#readBlock()
>
> Cheers
>
>
> On Wed, Apr 16, 2014 at 5:39 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Cool. Thanks!
> >
> > Just to dig deeper,  is this because BloomFilter is part of Meta, and
> Meta
> > block always cached no matter what?
> >
> > Or it is because the BloomFilter is in the upper level of the searchTree
> in
> > the code path I pasted? I guess that code path is actually for data
> block,
> > not meta block?
> >
> > // Call HFile's caching block reader API. We always cache index
> >          // blocks, otherwise we might get terrible performance.
> >           boolean shouldCache = cacheBlocks || (lookupLevel <
> > searchTreeLevel);
> >           BlockType expectedBlockType;
> >           if (lookupLevel < searchTreeLevel - 1) {
> >             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
> >           } else if (lookupLevel == searchTreeLevel - 1) {
> >             expectedBlockType = BlockType.LEAF_INDEX;
> >           } else {
> >             // this also accounts for ENCODED_DATA
> >             expectedBlockType = BlockType.DATA;
> >           }
> >
> >
> > On Wed, Apr 16, 2014 at 4:59 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq. it is always cached on read even when per-family/per-query
> > cacheBlocks
> > > is turned off.
> > >
> > > True.
> > >
> > >
> > > On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang <ty...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have a use case where some data are mostly random read, so it
> > polluted
> > > > cache and caused big GC. It is better to turn off the block cache for
> > > those
> > > > data. So we are going to call setCacheBlocks(false) for those get().
> We
> > > > know that the index will be still cached based on below code path, so
> > we
> > > > are safe there.  But it is not clear if BloomFilter belong to the
> > level <
> > > > searchTreeLevel, and also get cached also.
> > > >
> > > >          // Call HFile's caching block reader API. We always cache
> > index
> > > >          // blocks, otherwise we might get terrible performance.
> > > >           boolean shouldCache = cacheBlocks || (lookupLevel <
> > > > searchTreeLevel);
> > > >           BlockType expectedBlockType;
> > > >           if (lookupLevel < searchTreeLevel - 1) {
> > > >             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
> > > >           } else if (lookupLevel == searchTreeLevel - 1) {
> > > >             expectedBlockType = BlockType.LEAF_INDEX;
> > > >           } else {
> > > >             // this also accounts for ENCODED_DATA
> > > >             expectedBlockType = BlockType.DATA;
> > > >           }
> > > >
> > > > Or I think because BloomFilter is part of Meta data, so it is always
> > > cached
> > > > on read even when per-family/per-query cacheBlocks is turned off. Am
> I
> > > > right?
> > > >
> > > > Thanks
> > > > Tian-Ying
> > > >
> > >
> >
>

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

Posted by Ted Yu <yu...@gmail.com>.
Tianying:
Please take a look at CacheConfig#shouldCacheBlockOnRead() which is called
by HFileReaderV2#readBlock()

Cheers


On Wed, Apr 16, 2014 at 5:39 PM, Tianying Chang <ty...@gmail.com> wrote:

> Cool. Thanks!
>
> Just to dig deeper,  is this because BloomFilter is part of Meta, and Meta
> block always cached no matter what?
>
> Or it is because the BloomFilter is in the upper level of the searchTree in
> the code path I pasted? I guess that code path is actually for data block,
> not meta block?
>
> // Call HFile's caching block reader API. We always cache index
>          // blocks, otherwise we might get terrible performance.
>           boolean shouldCache = cacheBlocks || (lookupLevel <
> searchTreeLevel);
>           BlockType expectedBlockType;
>           if (lookupLevel < searchTreeLevel - 1) {
>             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
>           } else if (lookupLevel == searchTreeLevel - 1) {
>             expectedBlockType = BlockType.LEAF_INDEX;
>           } else {
>             // this also accounts for ENCODED_DATA
>             expectedBlockType = BlockType.DATA;
>           }
>
>
> On Wed, Apr 16, 2014 at 4:59 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. it is always cached on read even when per-family/per-query
> cacheBlocks
> > is turned off.
> >
> > True.
> >
> >
> > On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang <ty...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We have a use case where some data are mostly random read, so it
> polluted
> > > cache and caused big GC. It is better to turn off the block cache for
> > those
> > > data. So we are going to call setCacheBlocks(false) for those get(). We
> > > know that the index will be still cached based on below code path, so
> we
> > > are safe there.  But it is not clear if BloomFilter belong to the
> level <
> > > searchTreeLevel, and also get cached also.
> > >
> > >          // Call HFile's caching block reader API. We always cache
> index
> > >          // blocks, otherwise we might get terrible performance.
> > >           boolean shouldCache = cacheBlocks || (lookupLevel <
> > > searchTreeLevel);
> > >           BlockType expectedBlockType;
> > >           if (lookupLevel < searchTreeLevel - 1) {
> > >             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
> > >           } else if (lookupLevel == searchTreeLevel - 1) {
> > >             expectedBlockType = BlockType.LEAF_INDEX;
> > >           } else {
> > >             // this also accounts for ENCODED_DATA
> > >             expectedBlockType = BlockType.DATA;
> > >           }
> > >
> > > Or I think because BloomFilter is part of Meta data, so it is always
> > cached
> > > on read even when per-family/per-query cacheBlocks is turned off. Am I
> > > right?
> > >
> > > Thanks
> > > Tian-Ying
> > >
> >
>

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

Posted by Tianying Chang <ty...@gmail.com>.
Cool. Thanks!

Just to dig deeper,  is this because BloomFilter is part of Meta, and Meta
block always cached no matter what?

Or it is because the BloomFilter is in the upper level of the searchTree in
the code path I pasted? I guess that code path is actually for data block,
not meta block?

// Call HFile's caching block reader API. We always cache index
         // blocks, otherwise we might get terrible performance.
          boolean shouldCache = cacheBlocks || (lookupLevel <
searchTreeLevel);
          BlockType expectedBlockType;
          if (lookupLevel < searchTreeLevel - 1) {
            expectedBlockType = BlockType.INTERMEDIATE_INDEX;
          } else if (lookupLevel == searchTreeLevel - 1) {
            expectedBlockType = BlockType.LEAF_INDEX;
          } else {
            // this also accounts for ENCODED_DATA
            expectedBlockType = BlockType.DATA;
          }


On Wed, Apr 16, 2014 at 4:59 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. it is always cached on read even when per-family/per-query cacheBlocks
> is turned off.
>
> True.
>
>
> On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang <ty...@gmail.com> wrote:
>
> > Hi,
> >
> > We have a use case where some data are mostly random read, so it polluted
> > cache and caused big GC. It is better to turn off the block cache for
> those
> > data. So we are going to call setCacheBlocks(false) for those get(). We
> > know that the index will be still cached based on below code path, so we
> > are safe there.  But it is not clear if BloomFilter belong to the level <
> > searchTreeLevel, and also get cached also.
> >
> >          // Call HFile's caching block reader API. We always cache index
> >          // blocks, otherwise we might get terrible performance.
> >           boolean shouldCache = cacheBlocks || (lookupLevel <
> > searchTreeLevel);
> >           BlockType expectedBlockType;
> >           if (lookupLevel < searchTreeLevel - 1) {
> >             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
> >           } else if (lookupLevel == searchTreeLevel - 1) {
> >             expectedBlockType = BlockType.LEAF_INDEX;
> >           } else {
> >             // this also accounts for ENCODED_DATA
> >             expectedBlockType = BlockType.DATA;
> >           }
> >
> > Or I think because BloomFilter is part of Meta data, so it is always
> cached
> > on read even when per-family/per-query cacheBlocks is turned off. Am I
> > right?
> >
> > Thanks
> > Tian-Ying
> >
>

Re: Will BloomFilter still be cached if setCacheBlocks(false) per Get()?

Posted by Ted Yu <yu...@gmail.com>.
bq. it is always cached on read even when per-family/per-query cacheBlocks
is turned off.

True.


On Wed, Apr 16, 2014 at 4:41 PM, Tianying Chang <ty...@gmail.com> wrote:

> Hi,
>
> We have a use case where some data are mostly random read, so it polluted
> cache and caused big GC. It is better to turn off the block cache for those
> data. So we are going to call setCacheBlocks(false) for those get(). We
> know that the index will be still cached based on below code path, so we
> are safe there.  But it is not clear if BloomFilter belong to the level <
> searchTreeLevel, and also get cached also.
>
>          // Call HFile's caching block reader API. We always cache index
>          // blocks, otherwise we might get terrible performance.
>           boolean shouldCache = cacheBlocks || (lookupLevel <
> searchTreeLevel);
>           BlockType expectedBlockType;
>           if (lookupLevel < searchTreeLevel - 1) {
>             expectedBlockType = BlockType.INTERMEDIATE_INDEX;
>           } else if (lookupLevel == searchTreeLevel - 1) {
>             expectedBlockType = BlockType.LEAF_INDEX;
>           } else {
>             // this also accounts for ENCODED_DATA
>             expectedBlockType = BlockType.DATA;
>           }
>
> Or I think because BloomFilter is part of Meta data, so it is always cached
> on read even when per-family/per-query cacheBlocks is turned off. Am I
> right?
>
> Thanks
> Tian-Ying
>