You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Swaroop Patra <sw...@gmail.com> on 2013/11/14 08:18:55 UTC
Lookup in a dataset
Hi All,
I need little help on scripting below condition.
I have 2 input tab separated files. Lets consider input1 and input2.
input1
---------
col1 col2 col3
input2
--------
col4
I have to fetch records from input1 where col3 value is not present in
input2.col4
e.g.
input1
----------
11 12 13
21 22 23
31 32 33
41 42 43
Input2
---------
12
23
33
45
output
---------
11 12 13
41 42 43
As 33(input1.row3.col3) & 43 is not available in input2.col4.
Thanks & Regards,
Swaroop
Re: Lookup in a dataset
Posted by Swaroop Kumar Patra <sw...@gmail.com>.
Thanks Aaron for replay.
I will try this out.
Thanks,
Swaroop
On 14-Nov-2013, at 5:37 pm, Aaron Zimmerman <az...@sproutsocial.com> wrote:
> You’ll want to use COGROUP.
>
> Something like
>
> x = COGROUP input1 by col3, input2 by col4;
>
> needed = FILTER x by IsEmpty(input2);
>
>
> Thanks,
>
> Aaron Zimmerman
> Platform Engineer
> Sprout Social
> 773.227.7528
> @apzimmerman
> sproutsocial.com
>
> On November 14, 2013 at 1:19:46 AM, Swaroop Patra (swarooppatra@gmail.com) wrote:
>
>> Hi All,
>>
>> I need little help on scripting below condition.
>>
>> I have 2 input tab separated files. Lets consider input1 and input2.
>> input1
>> ---------
>> col1 col2 col3
>> input2
>> --------
>> col4
>>
>> I have to fetch records from input1 where col3 value is not present in
>> input2.col4
>>
>> e.g.
>> input1
>> ----------
>> 11 12 13
>> 21 22 23
>> 31 32 33
>> 41 42 43
>> Input2
>> ---------
>> 12
>> 23
>> 33
>> 45
>>
>>
>> output
>> ---------
>> 11 12 13
>> 41 42 43
>>
>> As 33(input1.row3.col3) & 43 is not available in input2.col4.
>>
>> Thanks & Regards,
>> Swaroop
Re: Lookup in a dataset
Posted by Aaron Zimmerman <az...@sproutsocial.com>.
You’ll want to use COGROUP.
Something like
x = COGROUP input1 by col3, input2 by col4;
needed = FILTER x by IsEmpty(input2);
Thanks,
Aaron Zimmerman
Platform Engineer
Sprout Social
773.227.7528
@apzimmerman
sproutsocial.com
On November 14, 2013 at 1:19:46 AM, Swaroop Patra (swarooppatra@gmail.com) wrote:
Hi All,
I need little help on scripting below condition.
I have 2 input tab separated files. Lets consider input1 and input2.
input1
---------
col1 col2 col3
input2
--------
col4
I have to fetch records from input1 where col3 value is not present in
input2.col4
e.g.
input1
----------
11 12 13
21 22 23
31 32 33
41 42 43
Input2
---------
12
23
33
45
output
---------
11 12 13
41 42 43
As 33(input1.row3.col3) & 43 is not available in input2.col4.
Thanks & Regards,
Swaroop