You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2008/02/06 00:13:09 UTC

[jira] Commented: (HBASE-70) [hbase] memory management

    [ https://issues.apache.org/jira/browse/HBASE-70?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565944#action_12565944 ] 

Jim Kellerman commented on HBASE-70:
------------------------------------

Stack and I discussed the possibility of moving the memcache back to the region level instead of the store level, because it would make accounting easier. However this approach has some serious drawbacks:
- some of the methods that access both the memcache and the store files (such as getKeys, etc), are more efficient when everything is at the store level.
- we experienced a great deal of pain moving the memcache to the store level from the region level in the first place as it forced us to re-write a lot of the scanner code.
- the reason for moving memcache from region to store level was because it was greatly simplified and reduced contention. Before when the cache filled, we had no idea how much of it belonged to which family

So what I would suggest, instead of moving memcache back up to the region level, is to move the cache size management back up to the region level. Let the region keep track of the total cache space in use, which store(s) have the largest caches, etc. Contention is reduced to smaller data structures that manage the accounting, instead of bigger structures like the caches themselves.

This way we achieve:
- Better control of the overall cache space in use
- Eliminate the need for radical modifications (moving the cache back to the region level at this point would be much harder than when when it was moved to the store level in the first place since so much more has been added)

Basically we are able to gain what we need (better memory management), with less contention on the caches themselves, via a less risky (and radical) change.


> [hbase] memory management
> -------------------------
>
>                 Key: HBASE-70
>                 URL: https://issues.apache.org/jira/browse/HBASE-70
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: stack
>
> Each Store has a Memcache of edits that is flushed on a fixed period (It used to be flushed when it grew beyond a limit). A Region can be made up of N Stores.  A regionserver has no upper bound on the number of regions that can be deployed to it currently.  Add to this that per mapfile, we have read the index into memory.  We're also talking about adding caching of blocks and cells.
> We need a means of keeping an account of memory usage adjusting cache sizes and flush rates (or sizes) dynamically -- using References where possible -- to accomodate deployment of added regions.  If memory is strained, we should reject regions proffered by the master with a resouce-constrained, or some such, message.
> The manual sizing we currently do ain't going to cut it for clusters of any decent size.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.