You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Alan Gates (JIRA)" <ji...@apache.org> on 2008/10/30 19:02:44 UTC

[jira] Updated: (PIG-511) DIFF does not work in types branch

     [ https://issues.apache.org/jira/browse/PIG-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-511:
---------------------------

    Attachment: PIG-511.patch

Made the change suggested.  I also added some unit tests that revealed that the algorithm for computing the diff between the bags was flawed.  This patch uses two hash tables instead of trying to sort the two and walk them in unison.

> DIFF does not work in types branch
> ----------------------------------
>
>                 Key: PIG-511
>                 URL: https://issues.apache.org/jira/browse/PIG-511
>             Project: Pig
>          Issue Type: Bug
>          Components: data
>    Affects Versions: types_branch
>         Environment: CentOS 5, hadoop 0.18.0, pig built from types branch
>            Reporter: Cristian Ivascu
>         Attachments: PIG-511.patch
>
>
> using DIFF(bag1, bag2) always returns an empty bag
> Reason: in the compute_diff, the input bags are discarded, and the actual operations are done against two newly created, empty bags
> fix: make sure the compute_diff(bag1, bag2, output) does its work on bag 1 and bag2, instead of d1 and d2.
> Currently:
>        DataBag d1 = mBagFactory.newDistinctBag();
>         DataBag d2 = mBagFactory.newDistinctBag();
>         Iterator<Tuple> i1 = d1.iterator();
>         Iterator<Tuple> i2 = d2.iterator();
>         while (i1.hasNext()) d1.add(i1.next());
>         while (i2.hasNext()) d2.add(i2.next());
> Should be:
>        DataBag d1 = mBagFactory.newDistinctBag();
>         DataBag d2 = mBagFactory.newDistinctBag();
>         Iterator<Tuple> i1 = bag1.iterator();
>         Iterator<Tuple> i2 = bag2.iterator();
>         while (i1.hasNext()) d1.add(i1.next());
>         while (i2.hasNext()) d2.add(i2.next());

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.