You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Rohini Palaniswamy (JIRA)" <ji...@apache.org> on 2015/06/12 21:59:00 UTC

[jira] [Resolved] (PIG-4591) Drop use of the internal Bzip2TextInputFormat

     [ https://issues.apache.org/jira/browse/PIG-4591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rohini Palaniswamy resolved PIG-4591.
-------------------------------------
       Resolution: Fixed
    Fix Version/s:     (was: 0.16.0)

Closing this as duplicate of PIG-3251. That jira has more context and patches.

> Drop use of the internal Bzip2TextInputFormat
> ---------------------------------------------
>
>                 Key: PIG-4591
>                 URL: https://issues.apache.org/jira/browse/PIG-4591
>             Project: Pig
>          Issue Type: Wish
>          Components: data, tools
>    Affects Versions: 0.14.0
>         Environment: set pig.noSplitCombination to false and pig.maxCombinedSplitSize hight enought so combination nof input files do happen.
>            Reporter: Remi Catherinot
>            Priority: Minor
>              Labels: easyfix
>
> When loading mutiple files which not all the files sharing the same compressor (load gz + bz2 + rawata files for example), depending on the last file used, PigStorage will use either Bzip2TextInputFormat if the last file ends with .bz2 end fail, or PigStorage will use TextInputFormat in any other case and succeed in ready all types of files (including the bz2 one).
> A = LOAD 'file1.gz,file2.bz2' USING PigStorage(); <-- this will fail
> B = LOAD 'file2.bz2,file1.gz' USING PigStorage(); <-- this will succeed
> I think another person suggested in the dev mailing list to drop the use of the internal pig Bzip2TextInputFormat because hadoop now better handle those cases (bz2 compression & co). I don't push the patch yet because i don't have a fully comliant pig test environnement so i'm not able to be sure this won't introduce a regression with the minimal supported level version of hadoop by pig 0.14/0.15 + i need to know if you agree in drop the internal Bzip2 stuff and rely on the hadoop implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)