You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Dave Latham (JIRA)" <ji...@apache.org> on 2016/02/26 16:49:18 UTC

[jira] [Commented] (HBASE-15339) Add archive tiers for date based tiered compaction

    [ https://issues.apache.org/jira/browse/HBASE-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169220#comment-15169220 ] 

Dave Latham commented on HBASE-15339:
-------------------------------------

Duo, I'd love to understand this a little better.  The tiered compaction in HBASE-15181 has a max tier, so once data reaches that tier it never need be compacted again unless you force a major compaction.  The windows in that tier are fixed, based on epoch time, and their boundaries won't move.  They are not, however, aligned with the calendar, so if that is what you need, then you definitely need an enhancement.  I could imagine a config to use days/weeks/months/quarters/years for example instead of the simple epoch exponential tier schedule of HBASE-15181.  Can you elaborate on your needs and your proposal?

> Add archive tiers for date based tiered compaction
> --------------------------------------------------
>
>                 Key: HBASE-15339
>                 URL: https://issues.apache.org/jira/browse/HBASE-15339
>             Project: HBase
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Duo Zhang
>
> For our MiCloud service, the old data is rarely touched but we still need to keep it, so we want to put the data on inexpensive device and reduce redundancy using EC to cut down the cost.
> With date based tiered compaction introduced in HBASE-15181, new data and old data can be placed in different tier. But the tier boundary moves as time lapse so it is still possible that we do compaction on old tier which breaks our block moving and EC work.
> So here we want to introduce an "archive tier" to better fit our scenario. Add an configuration called "archive unit", for example, year. That means, if we find that the tier boundary is already in the previous year, then we reset the boundary to the start of year and end of year, and if we want to do compaction in this tier, just compact all files into one file. The file will never be changed unless we force a major compaction so it is safe to apply EC and other cost reducing approach on the file. And we make more tiers before this tier year by year. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)