You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Bryan Beaudreault (Jira)" <ji...@apache.org> on 2023/12/13 22:44:00 UTC

[jira] [Updated] (HBASE-28216) HDFS erasure coding support for table data dirs

     [ https://issues.apache.org/jira/browse/HBASE-28216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Beaudreault updated HBASE-28216:
--------------------------------------
    Labels: patch-available  (was: )

> HDFS erasure coding support for table data dirs
> -----------------------------------------------
>
>                 Key: HBASE-28216
>                 URL: https://issues.apache.org/jira/browse/HBASE-28216
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Bryan Beaudreault
>            Assignee: Bryan Beaudreault
>            Priority: Major
>              Labels: patch-available
>
> [Erasure coding|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html] (EC) is a hadoop-3 feature which can drastically reduce storage requirements, at the expense of locality. At my company we have a few hbase clusters which are extremely data dense and take mostly write traffic, fewer reads (cold data). We'd like to reduce the cost of these clusters, and EC is a great way to do that since it can reduce replication related storage costs by 50%.
> It's possible to enable EC policies on sub directories of HDFS. One can manually set this with {{{}hdfs ec -setPolicy -path /hbase/data/default/usertable -policy xxxx{}}}. This can work without any hbase support.
> One problem with that is a lack of visibility by operators into which tables might have EC enabled. I think this is where HBase can help. Here's my proposal:
>  * Add a new TableDescriptor and ColumnDescriptor field ERASURE_CODING_POLICY
>  * In ModifyTableProcedure preflightChecks, if ERASURE_CODING_POLICY is set, verify that the requested policy is available and enabled via DistributedFileSystem.
> getErasureCodingPolicies().
>  * During ModifyTableProcedure, add a new state for MODIFY_TABLE_SYNC_ERASURE_CODING_POLICY.
>  ** When adding or changing a policy, use DistributedFileSystem.
> setErasureCodingPolicy to sync it for the data and archive dir of that table (or column in table)
>  ** When removing the property or setting it to empty, use DistributedFileSystem.
> unsetErasureCodingPolicy to remove it from the data and archive dir.
> Since this new API is in hadoop-3 only, we'll need to add a reflection wrapper class for managing the calls and verifying that the API is available. We'll similarly do that API check in preflightChecks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)