You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Lars George (JIRA)" <ji...@apache.org> on 2010/04/27 09:05:49 UTC

[jira] Updated: (HBASE-2225) Enable compression in HBase Export

     [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George updated HBASE-2225:
-------------------------------

    Attachment: HBASE-2225-v2-trunk.patch

Patch v2 includes using JOpt-Simple to gather various command line parameters for the Export. It also adds in parts HBASE-2434 namely the setCaching() option. It also adds a secondary option to specify time ranges.

What I need here is an OK that JOpt-Simple is the way to go and I used it proper. Note that I did not go down the "double braces" initializer, mainly because even if you specify a "ofType(Class)" JOpt will still return an "Object" only. Furthermore it does *not* - unlike other packages of the same kind - handle options with long and short option as the same that way. Only if you use an OptionSpec class you will get the proper types back since it uses generics and also combines options with long and short names.

Output is now:

{noformat}
$java org.apache.hadoop.hbase.mapreduce.Export

Option                                  Description                            
------                                  -----------                            
-?, -h, --help                          Show this help                         
-c, --caching <Integer>                 Number of rows for caching             
-e, --endtime <Long>                    End time as long value                 
--enddate <yyyyMMddHHmm>                End date (alternative to --endtime)    
-n, --versions <Integer>                Maximum versions                       
-o, --outputdir                         Output directory                       
-s, --starttime <Long>                  Start time as long value               
--startdate <yyyyMMddHHmm>              Start date (alternative to --starttime)
-t, --tablename                         Table name                             
-z, --compress                          Enable compression of output files     
{noformat}

Note: Another grief I have is that it also has the sorting of options when using "-h" for example hardcoded. That is also the case for commons-cli. Not sure what that is about but I would rather sort options the way I add them in the code as they belong to each other. Ah well.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch, HBASE-2225-v2-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.