You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by "Kochis, Allan" <Al...@schwab.com> on 2010/08/02 16:13:25 UTC

question on pig join

Hi,
 
 
Have a pig question. 
I have two HDFS file, a smaller file 
that has
|field1|field2|field3|
 
 
and a larger file that has
 
|..|.. |...|field2|....|field3|.....|field1|...| ..|
 
I would like to replace field2 and field3 in my larger file when they
are null match on field1.
 
I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.
 
Can this be done in a pig join?
 
 
Thanks,
 
Allan

Re: question on pig join

Posted by Thejas M Nair <te...@yahoo-inc.com>.

I am not sure about what you meant by "null match".

Would this work  ?

F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);

-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
    F2.field1 is null ? F1.field1 : F2.field1,
    F2.field1 is null ? F1.field1 : F2.field1
    ;





On 8/2/10 7:13 AM, "Kochis, Allan" <Al...@schwab.com> wrote:

Hi,


Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|


and a larger file that has

|..|.. |...|field2|....|field3|.....|field1|...| ..|

I would like to replace field2 and field3 in my larger file when they
are null match on field1.

I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.

Can this be done in a pig join?


Thanks,

Allan