You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by James Newhaven <ja...@gmail.com> on 2013/01/22 13:46:07 UTC

Subtracting contents of two bags

Hi,

I have two relations - A and B.  Both just contain user ids.

I want to get a list of users who are in A but not in B.

I am running Pig 0.9.1 and think this might be possible with the DIFF
function. I can see that DIFF requires one relation that contains the two
bags.

How can I create a relation that contains two bags so it can be supplied to
the DIFF function?

Any suggestions would be appreciated.

Thanks,
James

Re: Subtracting contents of two bags

Posted by Timothy Potter <th...@gmail.com>.
Bill's suggestion is good, but here is another approach that I think is
cleaner to read:

find_not_in_b = cogroup A by key OUTER, B by key;
not_in_b = foreach (filter find_not_in_b by IsEmpty(B)) generate flatten(A);


On Tue, Jan 22, 2013 at 8:53 AM, Bill Graham <bi...@gmail.com> wrote:

> You can do an left outer join of A and B and then filter by B is null.
>
> http://pig.apache.org/docs/r0.10.0/basic.html#join-outer
>
> On Tue, Jan 22, 2013 at 4:46 AM, James Newhaven <james.newhaven@gmail.com
> >wrote:
>
> > Hi,
> >
> > I have two relations - A and B.  Both just contain user ids.
> >
> > I want to get a list of users who are in A but not in B.
> >
> > I am running Pig 0.9.1 and think this might be possible with the DIFF
> > function. I can see that DIFF requires one relation that contains the two
> > bags.
> >
> > How can I create a relation that contains two bags so it can be supplied
> to
> > the DIFF function?
> >
> > Any suggestions would be appreciated.
> >
> > Thanks,
> > James
> >
>
>
>
> --
> *Note that I'm no longer using my Yahoo! email address. Please email me at
> billgraham@gmail.com going forward.*
>

Re: Subtracting contents of two bags

Posted by Bill Graham <bi...@gmail.com>.
You can do an left outer join of A and B and then filter by B is null.

http://pig.apache.org/docs/r0.10.0/basic.html#join-outer

On Tue, Jan 22, 2013 at 4:46 AM, James Newhaven <ja...@gmail.com>wrote:

> Hi,
>
> I have two relations - A and B.  Both just contain user ids.
>
> I want to get a list of users who are in A but not in B.
>
> I am running Pig 0.9.1 and think this might be possible with the DIFF
> function. I can see that DIFF requires one relation that contains the two
> bags.
>
> How can I create a relation that contains two bags so it can be supplied to
> the DIFF function?
>
> Any suggestions would be appreciated.
>
> Thanks,
> James
>



-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*