You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Kochis, Allan" <Al...@schwab.com> on 2010/08/02 16:13:25 UTC
question on pig join
Hi,
Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|
and a larger file that has
|..|.. |...|field2|....|field3|.....|field1|...| ..|
I would like to replace field2 and field3 in my larger file when they
are null match on field1.
I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.
Can this be done in a pig join?
Thanks,
Allan
Re: question on pig join
Posted by Thejas M Nair <te...@yahoo-inc.com>.
I am not sure about what you meant by "null match".
Would this work ?
F1 = load 'largefile' as (field1,..);
F2 = load 'smallfile' as (field2, ..);
-- as the file is very small , use replicated join.
J = join F1 by field1 LEFT, F2 by field1 using 'replicated';
FE = foreach J generate F1.field1,
F2.field1 is null ? F1.field1 : F2.field1,
F2.field1 is null ? F1.field1 : F2.field1
;
On 8/2/10 7:13 AM, "Kochis, Allan" <Al...@schwab.com> wrote:
Hi,
Have a pig question.
I have two HDFS file, a smaller file
that has
|field1|field2|field3|
and a larger file that has
|..|.. |...|field2|....|field3|.....|field1|...| ..|
I would like to replace field2 and field3 in my larger file when they
are null match on field1.
I am currently doing this by caching my smaller file and using a perl
hash lookup to populate the larger records in a UDF.
Can this be done in a pig join?
Thanks,
Allan