You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2009/11/10 22:35:27 UTC

[jira] Created: (PIG-1080) PigStorage may miss records when loading a file

PigStorage may miss records when loading a file
-----------------------------------------------

                 Key: PIG-1080
                 URL: https://issues.apache.org/jira/browse/PIG-1080
             Project: Pig
          Issue Type: Bug
            Reporter: Richard Ding
            Assignee: Richard Ding


When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 

The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.

 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Attachment: PIG-1080.patch

This patch excludes the bzip and gzip files from the change.

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776205#action_12776205 ] 

Hadoop QA commented on PIG-1080:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424510/PIG-1080.patch
  against trunk revision 834285.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/console

This message is automatically generated.

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Attachment: PIG-1080.patch

This patch fixes the problem.

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776165#action_12776165 ] 

Olga Natkovich commented on PIG-1080:
-------------------------------------

+1. Last patch looks good. Will commit once the automated test comes back

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1080:
--------------------------------

    Fix Version/s: 0.6.0

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>             Fix For: 0.6.0
>
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776218#action_12776218 ] 

Hadoop QA commented on PIG-1080:
--------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424515/PIG-1080.patch
  against trunk revision 834285.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/console

This message is automatically generated.

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Status: Patch Available  (was: Open)

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Attachment: PIG-1080.patch

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Status: Patch Available  (was: Open)

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-1080:
----------------------------

    Affects Version/s: 0.6.0

To be clear, this bug affects only trunk code, not any released version of Pig.  It is a result of the switch to using LineRecordReader, (PIG-960).

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1080:
--------------------------------

    Status: Open  (was: Patch Available)

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1080:
--------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

patch committed, thanks Richard!

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776267#action_12776267 ] 

Hadoop QA commented on PIG-1080:
--------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12424531/PIG-1080.patch
  against trunk revision 834285.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/console

This message is automatically generated.

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Olga Natkovich updated PIG-1080:
--------------------------------

    Status: Patch Available  (was: Open)

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-1080) PigStorage may miss records when loading a file

Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Richard Ding updated PIG-1080:
------------------------------

    Status: Open  (was: Patch Available)

> PigStorage may miss records when loading a file
> -----------------------------------------------
>
>                 Key: PIG-1080
>                 URL: https://issues.apache.org/jira/browse/PIG-1080
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Richard Ding
>            Assignee: Richard Ding
>         Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary. 
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.