You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hbase.apache.org by "Jonathan Gray (Commented) (JIRA)" <ji...@apache.org> on 2011/11/01 19:35:32 UTC

[jira] [Commented] (HBASE-4717) More efficient age-off of old data during major compaction

    [ https://issues.apache.org/jira/browse/HBASE-4717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141445#comment-13141445 ] 

Jonathan Gray commented on HBASE-4717:
--------------------------------------

+1 on this general direction.

We've long talked of special compaction heuristics that would bucketize by time in some way (and you could really take advantage of the TimeRangeTracker file selection stuff for read perf).  We did as you describe and set a small max.size, so once a file reached a certain size, it would never be compacted again.  This allowed us to "age out" the data by keeping old stuff separate from new stuff in files.

We were not trying to actually wipe out the data, only separate it, because this was mostly a read-modify-write workload that needed access to recent data but the old data still needed to be available for user read queries.  It would probably be simple to add a check during compaction time of the time range of each file and if the max is expired, just to wipe out that file.
                
> More efficient age-off of old data during major compaction
> ----------------------------------------------------------
>
>                 Key: HBASE-4717
>                 URL: https://issues.apache.org/jira/browse/HBASE-4717
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.94.0
>            Reporter: Todd Lipcon
>
> Many applications need to implement efficient age-off of old data. We currently only perform age-off during major compaction by scanning through all of the KVs. Instead, we could implement the following:
> - Set hbase.hstore.compaction.max.size reasonably small. Thus, older store files contain only smaller finite ranges of time.
> - Periodically run an "age-off compaction". This compaction would scan the current list of storefiles. Any store file that falls entirely out of the TTL time range would be dropped. Store files completely within the time range would be un-altered. Those crossing the time-range boundary could either be left alone or compacted using the existing compaction code.
> I don't have a design in mind for how exactly this would be implemented, but hope to generate some discussion.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira