You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Laukik Chitnis (JIRA)" <ji...@apache.org> on 2011/02/10 01:01:04 UTC

[jira] Commented: (PIG-1304) Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input

    [ https://issues.apache.org/jira/browse/PIG-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12992803#comment-12992803 ] 

Laukik Chitnis commented on PIG-1304:
-------------------------------------

Results from test-patch:

{noformat}

     [exec] +1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     +1 tests included.  The patch appears to include 3 new or modified tests.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec]
     [exec]


{noformat}


> Fail underlying M/R jobs when concatenated gzip and bz2 files are provided as input
> -----------------------------------------------------------------------------------
>
>                 Key: PIG-1304
>                 URL: https://issues.apache.org/jira/browse/PIG-1304
>             Project: Pig
>          Issue Type: New Feature
>    Affects Versions: 0.6.0
>            Reporter: Viraj Bhat
>            Assignee: Laukik Chitnis
>             Fix For: 0.9.0
>
>         Attachments: patch-PIG-1304-1
>
>
> I have the following txt files which are bzipped: \t =<TAB> 
> {code}
> $ bzcat A.txt.bz2 
> 1\ta
> 2\taa
> $bzcat B.txt.bz2
> 1\tb
> 2\tbb
> $cat *.bz2 > test/mymerge.bz2
> $bzcat test/mymerge.bz2 
> 1\ta
> 2\taa
> 1\tb
> 2\tbb
> $hadoop fs -put test/mymerge.bz2 /user/viraj
> {code}
> I now write a Pig script to print values of bz2.
> {code}
> A = load '/user/viraj/bzipgetmerge/mymerge.bz2' using PigStorage();
> dump A;
> {code}
> I get the records for the first bz2 file which I concatenated.
> (1,a)
> (2,aa)
> My M/R jobs do not fail or throw any warning about this, just that it drops records. Is there a way we can throw a warning or fail the underlying Map job, can it be done in Bzip2TextInputFormat class in Pig ?

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira