You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2014/07/17 01:47:06 UTC

[jira] [Commented] (HBASE-8894) Forward port compressed l2 cache from 0.89fb

    [ https://issues.apache.org/jira/browse/HBASE-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14064352#comment-14064352 ] 

stack commented on HBASE-8894:
------------------------------

Went through this patch and JIRA.  Reviewing to ensure we've picked-over all the good stuff contained herein.

The bulk of this patch is the bucket cache patch HBASE-7404 with some difference in formatting/style, cleanup and some extra tests.  Other differences are features most of which we now have in trunk (will doc those we don't below) and a hard-coded behavior policy for L1 and L2 tiers (the eviction/cache behavior between tiers).

If we were to bring this in, it'd be a new caching implementation altogether.  At a minimum, we'd need to refactor to remove dup of base BucketCache classes, classes we already have in trunk.  Besides, in trunk, the tendency is to encapsulate caching policy in a BlockCache implementation (e.g. CombinedBlockCache) so to keep in mode w/ trunk, we'd refactor this code to have a policy that implemented this particular L1 to L2 relationship.  A caching policy would also manage how cache-on-write or cache-on-read works.  This patch pollutes base classes HFileReaderV2, etc., with the details of this particular caching policy's cache-on-write/cache-on-read details which would have to be redone too.

But I think we have the bulk of what this patch does in trunk now caveat a few diffs listed below.

bq. ..compressed and encoded second-level cache.

Inspired by this patch, HBASE-11331 adds this.  It is yet to be committed. A report shows compression makes for much less throughput, more CPU (more GC), with slightly less i/o on a smallish heap (TODO: test on larger heap).

bq. "Only handle byte arrays (i.e., no more serialization/deserialization within)"

The bucketcache serialization in trunk does ByteBuffer reuse/slicing when reading (new object, no 'serializing').  On or offheap cache, some allocation of HFileBlock header info is done. So some 'serialization' cost but not too much if I am reading correctly.  I may have missed something here?

bq.  Remove persistence support for the time being

We kept this support in trunk when we brought over HBASE-7404

bq. Keep an index of hfilename to blocks for efficient eviction on close

Trunk has this. It does not allow different config for L1 and L2; it just evicts blocks from all caches when a file closes by default ('Cache policy' can change this behavior).

This patch adds cache-on-read but it has a particular semantic in that it is whether to cache-on-read the block into L1 when fetched from L2. In trunk we  just 'cache-on-read' always (letting the cache 'policy' decide where the block should go).

The builder for CacheConfig in this patch is sweet but will pass on it for now since it would ripple through code base.  May be back for it later.

bq. Flush: 1. Write the already compressed and serialized data to L2 cache along with disk.

If I am reading correctly, we do this in trunk (To be confirmed).

bq. ...there's ability to keep very hot blocks (both data and meta blocks) in the regular BlockCache, which becomes the L1 cache. 

This has been added to master.  You have to enable it but it is possible. An improvement would be HBASE-11331 moving blocks to L1 if hot.

bq. Sending blocks evicted from L1 directly to L2
bq. Evicting blocks from the L2 cache upon promotion to the L1 cache

Still needs to be done in this patch.

There is an option for setting up L1+L2 in trunk such that blocks evicted from L1 go to L2  -- 'victimHandler' in LruBlockCache (The second item is not done in trunk nor in this patch if I am reading correctly)

bq. Issues evicting/draining...

Needs work still in this patch (and in master)

This patch adds clearCache to BlockCache for testing.  I added this to trunk.  Useful.

We have getOffset in BockCacheKey now in trunk.

We have the cacheconfig changes minus L2_ prefix in master.

Pulls in ByteBufferArray from hadoop.  We don't need to do that.

This patch adds trace logging around bucketcache main ops.  We could do with that.

I tried to bring over some of the tests but differences in config names and our missing some of the features made it tough, e.g. TestL2BucketCache, so punted.

Here are the lists of TODOs as I see it:

* Cache blocks to L2 on flush?  Check we do this in master.  We are missing this config in trunk (it is in this patch).
* Adds explicity L2 metrics.  Currently our metrics are combined.  I think this is pretty critical. HBASE-11532
* Add trace logging around eviction and caching to watch better the cache in action.


> Forward port compressed l2 cache from 0.89fb
> --------------------------------------------
>
>                 Key: HBASE-8894
>                 URL: https://issues.apache.org/jira/browse/HBASE-8894
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Priority: Critical
>         Attachments: HBASE-8894-0.94-v1.txt, HBASE-8894-0.94-v2.txt
>
>
> Forward port Alex's improvement on hbase-7407 from 0.89-fb branch:
> {code}
>   1 r1492797 | liyin | 2013-06-13 11:18:20 -0700 (Thu, 13 Jun 2013) | 43 lines
>   2
>   3 [master] Implements a secondary compressed cache (L2 cache)
>   4
>   5 Author: avf
>   6
>   7 Summary:
>   8 This revision implements compressed and encoded second-level cache with off-heap
>   9 (and optionally on-heap) storage and a bucket-allocator based on HBASE-7404.
>  10
>  11 BucketCache from HBASE-7404 is extensively modified to:
>  12
>  13 * Only handle byte arrays (i.e., no more serialization/deserialization within)
>  14 * Remove persistence support for the time being
>  15 * Keep an  index of hfilename to blocks for efficient eviction on close
>  16
>  17 A new interface (L2Cache) is introduced in order to separate it from the current
>  18 implementation. The L2 cache is then integrated into the classes that handle
>  19 reading from and writing to HFiles to allow cache-on-write as well as
>  20 cache-on-read. Metrics for the L2 cache are integrated into RegionServerMetrics
>  21 much in the same fashion as metrics for the existing (L2) BlockCache.
>  22
>  23 Additionally, CacheConfig class is re-refactored to configure the L2 cache,
>  24 replace multile constructors with a Builder, as well as replace static methods
>  25 for instantiating the caches with abstract factories (with singleton
>  26 implementations for both the existing LruBlockCache and the newly introduced
>  27 BucketCache based L2 cache)
>  28
>  29 Test Plan:
>  30 1) Additional unit tests
>  31 2) Stress test on a single devserver
>  32 3) Test on a single-node in shadow cluster
>  33 4) Test on a whole shadow cluster
>  34
>  35 Revert Plan:
>  36
>  37 Reviewers: liyintang, aaiyer, rshroff, manukranthk, adela
>  38
>  39 Reviewed By: liyintang
>  40
>  41 CC: gqchen, hbase-eng@
>  42
>  43 Differential Revision: https://phabricator.fb.com/D837264
>  44
>  45 Task ID: 2325295
>  7 ------------------------------------------------------------------------
>   6 r1492340 | liyin | 2013-06-12 11:36:03 -0700 (Wed, 12 Jun 2013) | 21 lines
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)