You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Harsh J (JIRA)" <ji...@apache.org> on 2012/07/06 09:13:34 UTC

[jira] [Commented] (PIG-2746) Pig doesn't detect all forms of compression extensions properly

    [ https://issues.apache.org/jira/browse/PIG-2746?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13407751#comment-13407751 ] 

Harsh J commented on PIG-2746:
------------------------------

Thanks for reverting this Daniel. I am sorry I didn't see that test.

I went over that JIRA and it looks like the issue was that the '.bz' files were not detected by the factory approach (essentially Bzip2Codec rejected knowing .bz files, not the factory itself).

I've opened HADOOP-8570 for tackling this upstream. For the moment, Pig can use a temporary workaround with a note, for which I'll re-attach my patch.
                
> Pig doesn't detect all forms of compression extensions properly
> ---------------------------------------------------------------
>
>                 Key: PIG-2746
>                 URL: https://issues.apache.org/jira/browse/PIG-2746
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Harsh J
>            Assignee: Harsh J
>             Fix For: 0.11
>
>         Attachments: PIG-2746.patch, PIG-2746.patch, PIG-2746.patch
>
>
> The PigStorage has the following snippet.
> {code}
> private void setCompression(Path path, Job job) {
>      	String location=path.getName();
>         if (location.endsWith(".bz2") || location.endsWith(".bz")) {
>             FileOutputFormat.setCompressOutput(job, true);
>             FileOutputFormat.setOutputCompressorClass(job,  BZip2Codec.class);
>         }  else if (location.endsWith(".gz")) {
>             FileOutputFormat.setCompressOutput(job, true);
>             FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);
>         } else {
>             FileOutputFormat.setCompressOutput( job, false);
>         }
>     }
> {code}
> This limits it to only work with STORE filenames provided as 'output.gz' or 'output.bz2' and for the rest (like LZO) one has to specify codecs and manually enable compression.
> Ideally Pig can rely on Hadoop's extension-to-codec detector instead of having this ladder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira