You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Eshcar Hillel (JIRA)" <ji...@apache.org> on 2018/06/18 18:56:00 UTC
[jira] [Commented] (HBASE-20542) Better heap utilization for IMC with MSLABs

    [ https://issues.apache.org/jira/browse/HBASE-20542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16516156#comment-16516156 ] 

Eshcar Hillel commented on HBASE-20542:
---------------------------------------

Patch is attached.
To reduce internal fragmentation the size of the active segment is set to be the size of one MSLAB chunk (by default 2MB).
An add operation is supplemented with pre-update and post update procedures.
The pre-update procedure atomically increases the size of the segment if this increment does not exceed the segment size threshold, and then continues with the normal path of updating the memstore.
If the increment will exceed the segment size threshold then the size is not increased and instead 
(1) the segment is flushed into the compaction pipeline,
(2) a new active segment is created, 
(3) an IMC task is scheduled in the background,
(4) the operation re-runs the pre-update procedure, this time with the new active segment.

This changes calls for an additional optimization.
The IMC no longer needs to acquire the region level updates lock. Instead we use segment level read-write lock to synchronize IMC with concurrent update operations. This is better since with the new solution IMC only needs to wait only for those few operations that already updated the size of the segment in the pre-update procedure but are still updating the segment skip list, and does not need to wait for operations of other stores. Moreover, update operation do not wait for in-memory flush to complete as before.
To synchronize, update operation take the read lock of the segment they are updating in the pre-update procedure, and release it in the post-update procedure. IMC thread take the write lock of each segment it is compacting. This ensures all updates that started before the in-memory flush have completed.

I will upload the patch also in RB.
Feel free to ask questions and comment.


> Better heap utilization for IMC with MSLABs
> -------------------------------------------
>
>                 Key: HBASE-20542
>                 URL: https://issues.apache.org/jira/browse/HBASE-20542
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Eshcar Hillel
>            Assignee: Eshcar Hillel
>            Priority: Major
>         Attachments: HBASE-20542.branch-2.001.patch
>
>
> Following HBASE-20188 we realized in-memory compaction combined with MSLABs may suffer from heap under-utilization due to internal fragmentation. This jira presents a solution to circumvent this problem. The main idea is to have each update operation check if it will cause overflow in the active segment *before* it is writing the new value (instead of checking the size after the write is completed), and if it is then the active segment is atomically swapped with a new empty segment, and is pushed (full-yet-not-overflowed) to the compaction pipeline. Later on the IMC deamon will run its compaction operation (flatten index/merge indices/data compaction) in the background. Some subtle concurrency issues should be handled with care. We next elaborate on them.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)