You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Jonathan Hsieh (JIRA)" <ji...@apache.org> on 2013/08/06 18:04:48 UTC

[jira] [Commented] (HBASE-9131) Add admin-level documention about configuration and usage of the Bucket Cache

    [ https://issues.apache.org/jira/browse/HBASE-9131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13730883#comment-13730883 ] 

Jonathan Hsieh commented on HBASE-9131:
---------------------------------------

[~zjushch] Thanks.  I think we are somewhere between too little detail and too much detail.

First, can we add the config variables to hbase-default.xml (with full descriptions and with units).

Now to the meat:

The patch doesn't tell the admin why or when they'd want to consider using this.  The link/pdf requires having to search for the bucket cache sections in the 2nd page and then goes on into too much design detail for an average admin.  (It also lacks the config variables  / instructions). 

My suggestion: Take let's take the high-level parts from section 3 of the pdf, polish it and add it to the official docs. 

Here's a stab at the sections that I think would be good for the ref guide with the prose improved a little bit: 

{quote}
*Design and Motivation* 

The Bucket Cache is an alternate block cache implementation that is designed to take advantage of large amounts of memory or low-latency storage.   (something about how big would be useful).   It is implemented as an off-the-jvm-heap and which has the secondary benefit of reducing JVM heap fragmentation that eventually causes stop-the-world JVM garbage collection operations. If one were to rely upon the standard JVM memory allocation and GC policies with large heaps (>16GB RAM) one would periodically incur instability in hbase due to long stop-the-world GC pauses (10's of secs to minutes) that can be misinterpreted as region server failures.

The storage of cached blocks is is not constrained to in RAM-only use; one could cache blocks in memory and also use a high speed disk, such as SSD's, Fusion-IO devices, or ram-disks as massive secondary cache.  (probably need something about the persistence properties not being required, but having the masssive capacity as a huge benefit.

Internally, the bucket cache divided storage into many *buckets*, each of which contains blocks of a particular range of sizes.  (this is a little fuzzy, needs some clarification).  Insertions and evictions of blocks backed by physical storage just overwrites blocks on the device or reads data from the storage device.  Managing these larger blocks prevents external fragmentation that causes GC pauses at the cost of some minor wasted space (internal fragmentation).

*Configuration and Usage*

To configure the bucket cache... (something along the line of what the current patch has)....

{quote}

Let me know what you think, and feel free to update/correct the draft.
                
> Add admin-level documention about configuration and usage of the Bucket Cache
> -----------------------------------------------------------------------------
>
>                 Key: HBASE-9131
>                 URL: https://issues.apache.org/jira/browse/HBASE-9131
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Jonathan Hsieh
>         Attachments: hbase-9131.patch
>
>
> HBASE-7404 added the bucket cache but its configuration settings are currently undocumented.  Without documentation developers would be the only ones aware of the feature.
> Specifically documentation about slide 23 from http://www.slideshare.net/cloudera/operations-session-4 would be great to add!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira