You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2009/06/23 09:48:07 UTC

[jira] Created: (PIG-861) POJoinPackage lose tuple in large dataset

POJoinPackage lose tuple in large dataset
-----------------------------------------

                 Key: PIG-861
                 URL: https://issues.apache.org/jira/browse/PIG-861
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: 0.2.0
            Reporter: Daniel Dai
            Assignee: Daniel Dai
             Fix For: 0.4.0


Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Status: In Progress  (was: Patch Available)

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Affects Version/s:     (was: 0.2.0)
                       0.3.0
               Status: Patch Available  (was: Open)

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Attachment: PIG-861-1.patch

The problem is caused by a bug in BinStorage.java which erroneously interprets character \255 in the binary stream as EOF. Tested on the original queries and the patch fix the problem. No unit test is included since this patch does not introduce any new feature.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727030#action_12727030 ] 

Hadoop QA commented on PIG-861:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412218/PIG-861-1.patch
  against trunk revision 790735.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    -1 patch.  The patch command could not apply the patch.

Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/110/console

This message is automatically generated.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Attachment: PIG-861-2.patch

Resync the patch to the latest trunk.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch committed

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Status: Patch Available  (was: In Progress)

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Status: Patch Available  (was: In Progress)

Submit again for Hudson to pick up.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727223#action_12727223 ] 

Hudson commented on PIG-861:
----------------------------

Integrated in Pig-trunk #494 (See [http://hudson.zones.apache.org/hudson/job/Pig-trunk/494/])
    : POJoinPackage lose tuple in large dataset


> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12727172#action_12727172 ] 

Hadoop QA commented on PIG-861:
-------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12412527/PIG-861-2.patch
  against trunk revision 790735.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/113/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/113/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/113/console

This message is automatically generated.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch, PIG-861-2.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Olga Natkovich (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725880#action_12725880 ] 

Olga Natkovich commented on PIG-861:
------------------------------------

+1, changes look good. Great catch! 

Need to make sure all tests pass before committing

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Status: In Progress  (was: Patch Available)

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.3.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.