You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Olga Natkovich (JIRA)" <ji...@apache.org> on 2008/10/30 19:12:44 UTC
[jira] Commented: (PIG-511) DIFF does not work in types branch
[ https://issues.apache.org/jira/browse/PIG-511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12644060#action_12644060 ]
Olga Natkovich commented on PIG-511:
------------------------------------
+1; patch looks good
> DIFF does not work in types branch
> ----------------------------------
>
> Key: PIG-511
> URL: https://issues.apache.org/jira/browse/PIG-511
> Project: Pig
> Issue Type: Bug
> Components: data
> Affects Versions: types_branch
> Environment: CentOS 5, hadoop 0.18.0, pig built from types branch
> Reporter: Cristian Ivascu
> Assignee: Alan Gates
> Fix For: types_branch
>
> Attachments: PIG-511.patch
>
>
> using DIFF(bag1, bag2) always returns an empty bag
> Reason: in the compute_diff, the input bags are discarded, and the actual operations are done against two newly created, empty bags
> fix: make sure the compute_diff(bag1, bag2, output) does its work on bag 1 and bag2, instead of d1 and d2.
> Currently:
> DataBag d1 = mBagFactory.newDistinctBag();
> DataBag d2 = mBagFactory.newDistinctBag();
> Iterator<Tuple> i1 = d1.iterator();
> Iterator<Tuple> i2 = d2.iterator();
> while (i1.hasNext()) d1.add(i1.next());
> while (i2.hasNext()) d2.add(i2.next());
> Should be:
> DataBag d1 = mBagFactory.newDistinctBag();
> DataBag d2 = mBagFactory.newDistinctBag();
> Iterator<Tuple> i1 = bag1.iterator();
> Iterator<Tuple> i2 = bag2.iterator();
> while (i1.hasNext()) d1.add(i1.next());
> while (i2.hasNext()) d2.add(i2.next());
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.