You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Eshcar Hillel (JIRA)" <ji...@apache.org> on 2015/10/26 15:44:28 UTC

[jira] [Updated] (HBASE-13408) HBase In-Memory Memstore Compaction

     [ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eshcar Hillel updated HBASE-13408:
----------------------------------
    Attachment: InMemoryMemstoreCompactionMasterEvaluationResults.pdf
                HBASE-13408-trunk-v07.patch

Attaching a new patch after rebase and code review changes.
One of the changes in the code is aligning the initialization of the memstore with the memstore class name configuration setting. To create a compacted memstore one needs to configure the hbase with
<code>
hbase.regionserver.memstore.class=org.apache.hadoop.hbase.regionserver.CompactedMemStore
<code>

In addition, we reproduced the results of the benchmarks for the master code (new and original) measured in different settings and workloads. Report is attached.

> HBase In-Memory Memstore Compaction
> -----------------------------------
>
>                 Key: HBASE-13408
>                 URL: https://issues.apache.org/jira/browse/HBASE-13408
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, HBASE-13408-trunk-v06.patch, HBASE-13408-trunk-v07.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf, InMemoryMemstoreCompactionMasterEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf, StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated. 
> Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval. 
> We suggest a new compacted memstore with the following principles:
> 1.	The data is kept in memory for as long as possible
> 2.	Memstore data is either compacted or in process of being compacted 
> 3.	Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)