You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hbase.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2010/02/13 16:46:28 UTC

[jira] Created: (HBASE-2225) Enable compression in HBase Export

Enable compression in HBase Export
----------------------------------

                 Key: HBASE-2225
                 URL: https://issues.apache.org/jira/browse/HBASE-2225
             Project: Hadoop HBase
          Issue Type: Improvement
          Components: util
    Affects Versions: 0.20.1
         Environment: OS agnostic
            Reporter: Ted Yu
            Priority: Minor


org.apache.hadoop.hbase.mapreduce.Export should set compression codec

In createSubmittableJob(), the following should be added:
    FileOutputFormat.setCompressOutput(job, true);
    FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);

>From my experiment, 10% to 50% reduction in Export output has been observed.

SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833576#action_12833576 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

Here is the code I use:
	// determine if GzipCodec should be used
	HBaseAdmin admin = new HBaseAdmin((HBaseConfiguration)conf);
	HTableDescriptor tableDesc = admin.getTableDescriptor(tableName.getBytes());
	Collection<HColumnDescriptor> families = tableDesc.getFamilies();
	boolean compressed = true;
	for (HColumnDescriptor col : families)
	{
		Compression.Algorithm algo = col.getCompressionType();
		if (algo == Compression.Algorithm.NONE)
		{
			compressed = false;
		}
	}
	if (!compressed)
	{
		FileOutputFormat.setCompressOutput(job, true);
		FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
	}


> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838494#action_12838494 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

Would it be useful if we add the ability to filter records using selected row key values ?

Thanks

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835013#action_12835013 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

We use hadoop-0.20.1 and hbase-0.20.1
Would this combination count ?







> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833421#action_12833421 ] 

stack commented on HBASE-2225:
------------------------------

I think this should be an option.  How about adding it as a command-line flag or something to the export job?

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834680#action_12834680 ] 

Lars George commented on HBASE-2225:
------------------------------------

Agreed. I think what Ted meant (and Andrew also touched) is that if a table has compression enabled then it would make sense to use it for backups too. So that small tables for example are stored as is. Ted, since the backup reads the KeyValue records compression is not part of the equation anymore, i.e. the MapReduce job doing the backup does not know if the table was compressed or not. I'll implement the command line switch and attach a patch today.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834958#action_12834958 ] 

Lars George commented on HBASE-2225:
------------------------------------

Trying to test this now and getting

{code}
java.lang.IllegalArgumentException: SequenceFile doesn't work with GzipCodec without native-hadoop code!
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:379)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:347)
        at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:420)
        at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:60)
        at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:71)
        at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:509)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:619)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:323)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:257)
{code}

Is that a change in Hadoop? This used to work before I am sure. Comments? I will look into it.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833685#action_12833685 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

Let's implement using my initial suggestion which Lars and Stack concurred.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833458#action_12833458 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

Little more detail: we iterate through HColumnDescriptor's of the table (HTableDescriptor.getFamilies()). If all column families are compressed, we don't use GzipCodec for Export.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834967#action_12834967 ] 

Lars George commented on HBASE-2225:
------------------------------------

Looks like that is the same check in Hadoop 0.20.1, so it must be due to my local setup. Ted, care to test the change? Otherwise I tested it with various command line arguments and it works as expected.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George updated HBASE-2225:
-------------------------------

    Attachment: HBASE-2225-trunk.patch

Patch adds command line switch to enable compression as well as fixing a small typo in info line logged at startup.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833434#action_12833434 ] 

Lars George commented on HBASE-2225:
------------------------------------

+1

I have done the same in the past and can confirm the implicit support by the InputFormat. I added it like this

{code}
    // set output stream compression
    if (params.get(CONF_COMPRESS) != null) {
      job.set("mapred.output.compress", "true");
      job.set("mapred.output.compression.codec", "org.apache.hadoop.io.compress.GzipCodec");
    }
{code}

where CONF_COMPRESS is a simple command line switch. This is mapred code so Ted's code is more current and can be used as is.

Ted, you want to make a patch? If not I can add it as well. Let me know.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "stack (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833678#action_12833678 ] 

stack commented on HBASE-2225:
------------------------------

.bq We detect compression mode of the table first. If the table is compressed, we don't apply GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified.

Isn't the fact that the table is compressed orthogonal to whether or not the export should be compressed?

I'd say, no compression should be the default.  Thats how its been working up to this.

I'm good with the compression being gzip since as Lars and Ted say, its native to sequencefiles.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833617#action_12833617 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

If LZO compression is 10% bigger than that of GZ, it may be fine not to compress again with GZ for export.
I think command line switch comes into play when table is LZO compressed - it's up to the user of Export to decide.

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Andrew Purtell (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833602#action_12833602 ] 

Andrew Purtell commented on HBASE-2225:
---------------------------------------

Need a test for LZO also then?

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Assigned: (HBASE-2225) Enable compression in HBase Export

Posted by "Lars George (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lars George reassigned HBASE-2225:
----------------------------------

    Assignee: Lars George

> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Assignee: Lars George
>            Priority: Minor
>         Attachments: HBASE-2225-trunk.patch
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HBASE-2225) Enable compression in HBase Export

Posted by "Ted Yu (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HBASE-2225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833453#action_12833453 ] 

Ted Yu commented on HBASE-2225:
-------------------------------

Using command line switch is fine.
I think we can make this feature more versatile by naming the switch no_compression_export. Meaning by default, GzipCodec is used for Export.

We detect compression mode of the table first. If the table is compressed, we don't apply GzipCodec. Otherwise we apply GzipCodec unless no_compression_export is specified.

Since SequenceFileInputFormat is able to handle GzipCodec, this won't cause regression for the Import class.


> Enable compression in HBase Export
> ----------------------------------
>
>                 Key: HBASE-2225
>                 URL: https://issues.apache.org/jira/browse/HBASE-2225
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: util
>    Affects Versions: 0.20.1
>         Environment: OS agnostic
>            Reporter: Ted Yu
>            Priority: Minor
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> org.apache.hadoop.hbase.mapreduce.Export should set compression codec
> In createSubmittableJob(), the following should be added:
>     FileOutputFormat.setCompressOutput(job, true);
>     FileOutputFormat.setOutputCompressorClass(job, org.apache.hadoop.io.compress.GzipCodec.class);
> From my experiment, 10% to 50% reduction in Export output has been observed.
> SequenceFileInputFormat used by the Import tool is able to detect GzipCodec - there is no change for Import class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.