You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Anastasia Braginsky (JIRA)" <ji...@apache.org> on 2015/10/01 11:23:28 UTC
[jira] [Commented] (HBASE-13408) HBase In-Memory Memstore
Compaction
[ https://issues.apache.org/jira/browse/HBASE-13408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939562#comment-14939562 ]
Anastasia Braginsky commented on HBASE-13408:
---------------------------------------------
Eshcar is OOO now, and since she started the review board, we could only post the new patch to a new board. We'll post early next week.
There is no big difference between v5 and v6, v6 was to catch up with trunk.
> HBase In-Memory Memstore Compaction
> -----------------------------------
>
> Key: HBASE-13408
> URL: https://issues.apache.org/jira/browse/HBASE-13408
> Project: HBase
> Issue Type: New Feature
> Reporter: Eshcar Hillel
> Assignee: Eshcar Hillel
> Fix For: 2.0.0
>
> Attachments: HBASE-13408-trunk-v01.patch, HBASE-13408-trunk-v02.patch, HBASE-13408-trunk-v03.patch, HBASE-13408-trunk-v04.patch, HBASE-13408-trunk-v05.patch, HBASE-13408-trunk-v06.patch, HBaseIn-MemoryMemstoreCompactionDesignDocument-ver02.pdf, HBaseIn-MemoryMemstoreCompactionDesignDocument.pdf, InMemoryMemstoreCompactionEvaluationResults.pdf, InMemoryMemstoreCompactionScansEvaluationResults.pdf, StoreSegmentandStoreSegmentScannerClassHierarchies.pdf
>
>
> A store unit holds a column family in a region, where the memstore is its in-memory component. The memstore absorbs all updates to the store; from time to time these updates are flushed to a file on disk, where they are compacted. Unlike disk components, the memstore is not compacted until it is written to the filesystem and optionally to block-cache. This may result in underutilization of the memory due to duplicate entries per row, for example, when hot data is continuously updated.
> Generally, the faster the data is accumulated in memory, more flushes are triggered, the data sinks to disk more frequently, slowing down retrieval of data, even if very recent.
> In high-churn workloads, compacting the memstore can help maintain the data in memory, and thereby speed up data retrieval.
> We suggest a new compacted memstore with the following principles:
> 1. The data is kept in memory for as long as possible
> 2. Memstore data is either compacted or in process of being compacted
> 3. Allow a panic mode, which may interrupt an in-progress compaction and force a flush of part of the memstore.
> We suggest applying this optimization only to in-memory column families.
> A design document is attached.
> This feature was previously discussed in HBASE-5311.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)