You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Daniel Dai (JIRA)" <ji...@apache.org> on 2009/07/01 02:10:47 UTC

[jira] Updated: (PIG-861) POJoinPackage lose tuple in large dataset

     [ https://issues.apache.org/jira/browse/PIG-861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-861:
---------------------------

    Attachment: PIG-861-1.patch

The problem is caused by a bug in BinStorage.java which erroneously interprets character \255 in the binary stream as EOF. Tested on the original queries and the patch fix the problem. No unit test is included since this patch does not introduce any new feature.

> POJoinPackage lose tuple in large dataset
> -----------------------------------------
>
>                 Key: PIG-861
>                 URL: https://issues.apache.org/jira/browse/PIG-861
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: 0.4.0
>
>         Attachments: PIG-861-1.patch
>
>
> Some script using POJoinPackage loses records when processing large amount of input data. We do not see this problem in smaller input. We can reproduce this problem, however, the dataset for the test case is too big to be included here. We suspect that POJoinPackage causes the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.