You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Yechao Chen (JIRA)" <ji...@apache.org> on 2019/02/22 06:22:00 UTC

[jira] [Updated] (HBASE-21810) bulkload support set hfile compression on client

     [ https://issues.apache.org/jira/browse/HBASE-21810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yechao Chen updated HBASE-21810:
--------------------------------
    Description: 
hbase bulkload (HFileOutputFormat2) generate hfile ,the compression from the table(cf) compression,

if the compression can be set on client ,sometimes,it's useful,

some case in our production:

1、hfile bulkload replication between the data center with bandwidth limit, we can set the compression of the bulkload hfile not changing the table compression

2、bulkload hfile not set  compression ,but the table compression is gz/zstd/snappy... ,can reduce the hfile created time and compaction will make the hfile to compression finally

3、somethings the yarn nodes (hfile created by reduce) /dobulkload client has no compression lib,but the hbase cluster has,it's useful for this case

  was:
hbase bulkload (HFileOutputFormat2) generate hfile ,the compression from the table(cf) compression,

if the compression can be set on client ,somethings it's useful,

some case in our production:

1、hfile bulkload replication between the data center with bandwidth limit, we can set the compression of the bulkload hfile not changing the table compression

2、bulkload hfile not set  compression ,but the table compression is gz/zstd/snappy... ,can reduce the hfile created time and compaction will make the hfile to compression finally

3、somethings the yarn nodes (hfile created by reduce) /dobulkload client has no compression lib,but the hbase cluster has,it's useful for this case


> bulkload  support set hfile compression on client 
> --------------------------------------------------
>
>                 Key: HBASE-21810
>                 URL: https://issues.apache.org/jira/browse/HBASE-21810
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>    Affects Versions: 1.3.3, 1.4.9, 2.1.2, 1.2.10, 2.0.4
>            Reporter: Yechao Chen
>            Assignee: Yechao Chen
>            Priority: Major
>         Attachments: HBASE-21810.branch-1.001.patch, HBASE-21810.branch-1.2.001.patch, HBASE-21810.branch-2.001.patch, HBASE-21810.master.001.patch
>
>
> hbase bulkload (HFileOutputFormat2) generate hfile ,the compression from the table(cf) compression,
> if the compression can be set on client ,sometimes,it's useful,
> some case in our production:
> 1、hfile bulkload replication between the data center with bandwidth limit, we can set the compression of the bulkload hfile not changing the table compression
> 2、bulkload hfile not set  compression ,but the table compression is gz/zstd/snappy... ,can reduce the hfile created time and compaction will make the hfile to compression finally
> 3、somethings the yarn nodes (hfile created by reduce) /dobulkload client has no compression lib,but the hbase cluster has,it's useful for this case



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)