You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by "Ishan Chhabra (JIRA)" <ji...@apache.org> on 2014/01/12 08:59:50 UTC

[jira] [Created] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat

Ishan Chhabra created HBASE-10323:
-------------------------------------

             Summary: Auto detect data block encoding in HFileOutputFormat
                 Key: HBASE-10323
                 URL: https://issues.apache.org/jira/browse/HBASE-10323
             Project: HBase
          Issue Type: Improvement
            Reporter: Ishan Chhabra
            Assignee: Ishan Chhabra


Currently, one has to specify the data block encoding of the table explicitly using the config parameter "hbase.mapreduce.hfileoutputformat.datablock.encoding" when doing a bulkload load. This option is easily missed, not documented and also works differently than compression, block size and bloom filter type, which are auto detected. 

The solution would be to add support to auto detect datablock encoding similar to other parameters. 

The current patch does the following:
1. Automatically detects datablock encoding in HFileOutputFormat.
2. Keeps the legacy option of manually specifying the datablock encoding
around as a method to override auto detections.
3. Moves string conf parsing to the start of the program so that it fails
fast during starting up instead of failing during record writes. It also
makes the internals of the program type safe.
4. Adds missing doc strings and unit tests for code serializing and
deserializing config paramerters for bloom filer type, block size and
datablock encoding.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)