You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Pradeep Kamath (JIRA)" <ji...@apache.org> on 2010/02/24 00:50:28 UTC

[jira] Created: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

PigStorage per the new load-store redesign should support splitting of bzip files
---------------------------------------------------------------------------------

                 Key: PIG-1257
                 URL: https://issues.apache.org/jira/browse/PIG-1257
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.7.0
            Reporter: Pradeep Kamath
            Assignee: Pradeep Kamath
             Fix For: 0.7.0


PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Attachment: PIG-1257-2.patch

Attached new patch to address unit test failures

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257-2.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838261#action_12838261 ] 

Hadoop QA commented on PIG-1257:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436937/PIG-1257.patch
  against trunk revision 916065.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 10 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    -1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/215/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/215/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/215/console

This message is automatically generated.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Status: Open  (was: Patch Available)

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257-2.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Attachment: blockHeaderEndsAt136500.txt.bz2
                blockEndingInCR.txt.bz2
                PIG-1257-3.patch

Since the last patch, I uncovered some issue with code while testing some boundary conditions. I have fixed those in the new patch PIG-1257-3.patch and included those boundary conditions in testcases in TestBZip

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846080#action_12846080 ] 

Pradeep Kamath commented on PIG-1257:
-------------------------------------

I ran all unit tests on my local machines and also  the "test-patch" ant target:
    [exec] +1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     +1 tests included.  The patch appears to include 12 new or modified tests.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
     [exec] 
     [exec] 


> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845627#action_12845627 ] 

Hadoop QA commented on PIG-1257:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12438883/recordLossblockHeaderEndsAt136500.txt.bz2
  against trunk revision 923043.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/239/console

This message is automatically generated.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846038#action_12846038 ] 

Pradeep Kamath commented on PIG-1257:
-------------------------------------

In the following case in inputData the record will end with \r won't it? (notice the \r in the middle after 2)
{code}
          "1\t2\r3\t4", // '\r' case - this will be split into two tuples
{code}

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Attachment: recordLossblockHeaderEndsAt136500.txt.bz2

The .bz2 files attached to this issue should be put in test/org/apache/pig/test/data for this patch to pass unit tests.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12838695#action_12838695 ] 

Hadoop QA commented on PIG-1257:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12437067/PIG-1257-2.patch
  against trunk revision 916429.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 9 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/224/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/224/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/224/console

This message is automatically generated.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257-2.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846027#action_12846027 ] 

Benjamin Reed commented on PIG-1257:
------------------------------------

excellent work pradeep. just one minor thing:  you always append a \n before inputData in your test case, so you never test the case when you end with just \r


> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Status: Open  (was: Patch Available)

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257-2.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Benjamin Reed (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846083#action_12846083 ] 

Benjamin Reed commented on PIG-1257:
------------------------------------

+1 you are right. thanx pradeep. i think it is ready to commit.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Status: Patch Available  (was: Open)

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257-2.patch, PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Status: Patch Available  (was: Open)

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Status: Patch Available  (was: Open)

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Patch committed

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Pradeep Kamath (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pradeep Kamath updated PIG-1257:
--------------------------------

    Attachment: PIG-1257.patch

Attached patch builds an InputFormat (Bzip2TextInputFormat) on top of the existing CBZip2InputStream.

> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: PIG-1257.patch
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (PIG-1257) PigStorage per the new load-store redesign should support splitting of bzip files

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai closed PIG-1257.
---------------------------


> PigStorage per the new load-store redesign should support splitting of bzip files
> ---------------------------------------------------------------------------------
>
>                 Key: PIG-1257
>                 URL: https://issues.apache.org/jira/browse/PIG-1257
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Pradeep Kamath
>            Assignee: Pradeep Kamath
>             Fix For: 0.7.0
>
>         Attachments: blockEndingInCR.txt.bz2, blockHeaderEndsAt136500.txt.bz2, PIG-1257-2.patch, PIG-1257-3.patch, PIG-1257.patch, recordLossblockHeaderEndsAt136500.txt.bz2
>
>
> PigStorage implemented per new load-store-redesign (PIG-966) is based on TextInputFormat for reading data. TextInputFormat has support for reading bzip data but without support for splitting bzip files. In pig 0.6, splitting was enabled for bzip files - we should attempt to enable that feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.