You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Richard Ding (JIRA)" <ji...@apache.org> on 2009/11/10 22:35:27 UTC
[jira] Created: (PIG-1080) PigStorage may miss records when loading
a file
PigStorage may miss records when loading a file
-----------------------------------------------
Key: PIG-1080
URL: https://issues.apache.org/jira/browse/PIG-1080
Project: Pig
Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Attachment: PIG-1080.patch
This patch excludes the bzip and gzip files from the change.
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1080) PigStorage may miss records when
loading a file
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776205#action_12776205 ]
Hadoop QA commented on PIG-1080:
--------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424510/PIG-1080.patch
against trunk revision 834285.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/146/console
This message is automatically generated.
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Attachment: PIG-1080.patch
This patch fixes the problem.
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1080) PigStorage may miss records when
loading a file
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776165#action_12776165 ]
Olga Natkovich commented on PIG-1080:
-------------------------------------
+1. Last patch looks good. Will commit once the automated test comes back
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1080:
--------------------------------
Fix Version/s: 0.6.0
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Fix For: 0.6.0
>
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1080) PigStorage may miss records when
loading a file
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776218#action_12776218 ]
Hadoop QA commented on PIG-1080:
--------------------------------
-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424515/PIG-1080.patch
against trunk revision 834285.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
-1 core tests. The patch failed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/42/console
This message is automatically generated.
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Status: Patch Available (was: Open)
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Attachment: PIG-1080.patch
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Status: Patch Available (was: Open)
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alan Gates updated PIG-1080:
----------------------------
Affects Version/s: 0.6.0
To be clear, this bug affects only trunk code, not any released version of Pig. It is a result of the switch to using LineRecordReader, (PIG-960).
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1080:
--------------------------------
Status: Open (was: Patch Available)
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1080:
--------------------------------
Resolution: Fixed
Status: Resolved (was: Patch Available)
patch committed, thanks Richard!
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1080) PigStorage may miss records when
loading a file
Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776267#action_12776267 ]
Hadoop QA commented on PIG-1080:
--------------------------------
+1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12424531/PIG-1080.patch
against trunk revision 834285.
+1 @author. The patch does not contain any @author tags.
+1 tests included. The patch appears to include 3 new or modified tests.
+1 javadoc. The javadoc tool did not generate any warning messages.
+1 javac. The applied patch does not increase the total number of javac compiler warnings.
+1 findbugs. The patch does not introduce any new Findbugs warnings.
+1 release audit. The applied patch does not increase the total number of release audit warnings.
+1 core tests. The patch passed core unit tests.
+1 contrib tests. The patch passed contrib unit tests.
Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/147/console
This message is automatically generated.
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Olga Natkovich updated PIG-1080:
--------------------------------
Status: Patch Available (was: Open)
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1080) PigStorage may miss records when loading
a file
Posted by "Richard Ding (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/PIG-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Ding updated PIG-1080:
------------------------------
Status: Open (was: Patch Available)
> PigStorage may miss records when loading a file
> -----------------------------------------------
>
> Key: PIG-1080
> URL: https://issues.apache.org/jira/browse/PIG-1080
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Richard Ding
> Assignee: Richard Ding
> Attachments: PIG-1080.patch, PIG-1080.patch
>
>
> When a file is assigned to multiple mappers (one block per mapper), the blocks may not end at the exact record boundary. Special care is taken to ensure that all records are loaded by mappers (and exactly once), even for records that cross the block boundary.
> The PigStorage, however, doesn't correctly handle the case where a block ends at exactly record boundary and results in missing records.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.