You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars Hofhansl (JIRA)" <ji...@apache.org> on 2013/12/17 07:34:07 UTC

[jira] [Comment Edited] (HBASE-3484) Replace memstore's ConcurrentSkipListMap with our own implementation

    [ https://issues.apache.org/jira/browse/HBASE-3484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13850170#comment-13850170 ] 

Lars Hofhansl edited comment on HBASE-3484 at 12/17/13 6:33 AM:
----------------------------------------------------------------

>From [~mcorgan]...
bq. I've been pondering how to better compact the data in the memstore. Sometimes we see a 100MB memstore flush that is really 10MB of KeyValues, which gzips to like 2MB, meaning there is a ton of pointer overhead.

This should better now. In various patches I removed:
* caching of the row key (HBASE-7279)
* caching of the timestamp (HBASE-7279)
* caching of the KV length (HBASE-9956)

That saves 36 bytes + sizeOf(rowKey) for each KeyValue in the memstore.
The KV in memory overhead now is: 56 bytes. (the memstoreTS is also stored in the HFiles).



was (Author: lhofhansl):
>From [~mcorgan]...
bq. I've been pondering how to better compact the data in the memstore. Sometimes we see a 100MB memstore flush that is really 10MB of KeyValues, which gzips to like 2MB, meaning there is a ton of pointer overhead.

This should better now. In various patches I removed:
* caching of the row key (HBASE-7279)
* caching of the timestamp (HBASE-7279)
* caching of the KV length (HBASE-9956)

That saves 12 bytes + sizeOf(rowKey) for each KeyValue in the memstore.
The KV in memory overhead now is: 56 bytes. (the memstoreTS is also stored in the HFiles).


> Replace memstore's ConcurrentSkipListMap with our own implementation
> --------------------------------------------------------------------
>
>                 Key: HBASE-3484
>                 URL: https://issues.apache.org/jira/browse/HBASE-3484
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Priority: Critical
>         Attachments: WIP_HBASE-3484.patch, hierarchical-map.txt, memstore_drag.png
>
>
> By copy-pasting ConcurrentSkipListMap into HBase we can make two improvements to it for our use case in MemStore:
> - add an iterator.replace() method which should allow us to do upsert much more cheaply
> - implement a Set directly without having to do Map<KeyValue,KeyValue> to save one reference per entry
> It turns out CSLM is in public domain from its development as part of JSR 166, so we should be OK with licenses.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)