You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Swaroop Patra <sw...@gmail.com> on 2013/11/14 08:18:55 UTC

Lookup in a dataset

Hi All,

I need little help on scripting below condition.

I have 2 input tab separated files. Lets consider input1 and input2.
input1
---------
col1    col2    col3
input2
--------
col4

I have to fetch records from input1 where col3 value is not present in
input2.col4

e.g.
input1
----------
11    12    13
21    22    23
31    32    33
41    42    43
Input2
---------
12
23
33
45


output
---------
11    12    13
41    42    43

As 33(input1.row3.col3) & 43 is not available in input2.col4.

Thanks & Regards,
Swaroop

Re: Lookup in a dataset

Posted by Swaroop Kumar Patra <sw...@gmail.com>.
Thanks Aaron for replay. 

I will try this out.

Thanks,
Swaroop

On 14-Nov-2013, at 5:37 pm, Aaron Zimmerman <az...@sproutsocial.com> wrote:

> You’ll want to use COGROUP.
> 
> Something like
> 
> x = COGROUP input1 by col3, input2 by col4;
> 
> needed = FILTER x by IsEmpty(input2);
> 
> 
> Thanks,
> 
> Aaron Zimmerman
> Platform Engineer
> Sprout Social
> 773.227.7528
> @apzimmerman
> sproutsocial.com
> 
> On November 14, 2013 at 1:19:46 AM, Swaroop Patra (swarooppatra@gmail.com) wrote:
> 
>> Hi All, 
>> 
>> I need little help on scripting below condition. 
>> 
>> I have 2 input tab separated files. Lets consider input1 and input2. 
>> input1 
>> --------- 
>> col1 col2 col3 
>> input2 
>> -------- 
>> col4 
>> 
>> I have to fetch records from input1 where col3 value is not present in 
>> input2.col4 
>> 
>> e.g. 
>> input1 
>> ---------- 
>> 11 12 13 
>> 21 22 23 
>> 31 32 33 
>> 41 42 43 
>> Input2 
>> --------- 
>> 12 
>> 23 
>> 33 
>> 45 
>> 
>> 
>> output 
>> --------- 
>> 11 12 13 
>> 41 42 43 
>> 
>> As 33(input1.row3.col3) & 43 is not available in input2.col4. 
>> 
>> Thanks & Regards, 
>> Swaroop 


Re: Lookup in a dataset

Posted by Aaron Zimmerman <az...@sproutsocial.com>.
You’ll want to use COGROUP.

Something like

x = COGROUP input1 by col3, input2 by col4;

needed = FILTER x by IsEmpty(input2);


Thanks,

Aaron Zimmerman
Platform Engineer
Sprout Social
773.227.7528
@apzimmerman
sproutsocial.com

On November 14, 2013 at 1:19:46 AM, Swaroop Patra (swarooppatra@gmail.com) wrote:

Hi All,  

I need little help on scripting below condition.  

I have 2 input tab separated files. Lets consider input1 and input2.  
input1  
---------  
col1 col2 col3  
input2  
--------  
col4  

I have to fetch records from input1 where col3 value is not present in  
input2.col4  

e.g.  
input1  
----------  
11 12 13  
21 22 23  
31 32 33  
41 42 43  
Input2  
---------  
12  
23  
33  
45  


output  
---------  
11 12 13  
41 42 43  

As 33(input1.row3.col3) & 43 is not available in input2.col4.  

Thanks & Regards,  
Swaroop