You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Duo Zhang (JIRA)" <ji...@apache.org> on 2016/07/13 07:50:20 UTC

[jira] [Commented] (HBASE-16223) Drop duplicated delete markers in minor compaction

    [ https://issues.apache.org/jira/browse/HBASE-16223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374548#comment-15374548 ] 

Duo Zhang commented on HBASE-16223:
-----------------------------------

This requires changing the logic of {{ScanQueryMatcher}} but it is really complicated... I think we should refactor {{ScanQueryMatcher}} first.

> Drop duplicated delete markers in minor compaction
> --------------------------------------------------
>
>                 Key: HBASE-16223
>                 URL: https://issues.apache.org/jira/browse/HBASE-16223
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>
> Recently we suffer from this. One of our customers may delete the same row multiple times(the record is about 100, 000 times), and cause scan timeout.
> Now we trigger major compaction every day to drop the duplicated delete markers. But this is not a good idea since the cost of major compaction gets higher as the data gets larger.
> And in fact, I think only the newest delete marker is useful(if maxverions = 1), so we could only retain this delete marker when doing minor compaction.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)