You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jonathan Gray (JIRA)" <ji...@apache.org> on 2009/02/09 19:48:59 UTC

[jira] Commented: (HBASE-1192) LRU-style map for the block cache

    [ https://issues.apache.org/jira/browse/HBASE-1192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671957#action_12671957 ] 

Jonathan Gray commented on HBASE-1192:
--------------------------------------

My proposal is to build upon the work being done in HBASE-1186 and HBASE-1188 to create our own LRU-style Map specialized for the block cache.

A few points as to why I think we should move away from SoftReferences and manage everything ourselves:

- The defined loose constraints and observed non-uniform behavior of SoftReferences
- We're already "managing" heap usage for Memcache.  Using softrefs for block cache, we'll have something that's almost a black box and trying to use all available memory.  This could make the memcache flush out itself because the RS is under heap pressure.  We won't have much control over fairness between memcaches, indexes, and the block cache if using softrefs.  I propose we build something very similar to the MemcacheFlusher thread that would deal with fairness between the different elements of the RS that uses significant heap (memcaches, indexes, block cache, cell cache, in-memory families, blooms, etc...).  As with the new file format, there's going to be more parameters in hbase 0.20 in order to optimize for your use case.  Like the file format, we'll have to come up with reasonable defaults and write more documentation about the effects of the different settings.  Do we want to divide up the total available heap on startup between the different memory consumers, do we want to leave it wide open for memcaches/indexes/blocks until we're under heap pressure and then make a decision about how to flush or evict fairly?
- Ability to implement in-memory families as described in the bigtable paper very easily by adding priority into the eviction algorithm
- Full table scans can thrash the cache (for Streamy, we do this only for MR jobs not user-facing stuff).  With our own structure, we can use a modified LRU algorithm that is resistant to table scans (i'm a fan of ARC but there's license issues; it's fairly simple to implement this if you manually configure... ARC is cool because it self-tunes).

Those are my main points.  The primary reason to not go in this direction is simplicity.  However, I think what we've learned in the past couple releases from OOME hell, we must (and already are) be in the business of heap management.  Streamy guys have done the research and development to do memory management in java as best as it seems it can be done (based on other open source java caching apps), so I'm confident we can be correct, efficient, and accurate enough to prevent oome issues and get optimal performance.

Erik will post his findings from his work experimenting with softref behavior.

> LRU-style map for the block cache
> ---------------------------------
>
>                 Key: HBASE-1192
>                 URL: https://issues.apache.org/jira/browse/HBASE-1192
>             Project: Hadoop HBase
>          Issue Type: New Feature
>          Components: regionserver
>            Reporter: Jonathan Gray
>            Priority: Blocker
>             Fix For: 0.20.0
>
>
> We need to decide what structure to use to back the block cache.  The primary decision is whether to continue using SoftReferences or to build our own structure.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.