You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Matt Tanquary <ma...@gmail.com> on 2010/11/29 23:51:16 UTC

Mapping a reference table

I have this problem which I solved easily with M/R but I'm trying to solve
through PIG instead:

Given the following bags, perform a lookup in a special table to retrieve 4
additional variations of the data:
{(10), (15)}
{(5}
{(5), (10), (15)}

Lookup table:
5 15 30 8 2
10 125 135 13 3
15 4 90 10 1

Note the lookup table has 5 columns, 1 for each level. The bags are given as
level 1 data, so you will find that value in the first column of the lookup.
Now, for the fun part: Need to create new bags for each level based on the
given level 1 data. For instance:

{(10), (15)} IN would yield the additional bags:
{(125), (4)}
{(135), (90)}
{(13), (10)}
{(3), (1)}

additionally:
{(5)} IN would yield:
{(15)}
{(30)}
{(8)}
{(2)}

So, this is the final big picture:
Records IN:
{(10), (15)}
{(5)}

Records OUT:
{(10), (15)}
{(125), (4)}
{(135), (90)}
{(13), (10)}
{(3), (1)}
{(5)}
{(15)}
{(30)}
{(8)}
{(2)}

The cases where there is only one item in a bag is simple, but when more
than one are introduced I am unable to determine an efficient way to tackle
this. As a side note, I will probably only need to process up to 3 items in
a bag in this manner.

I hope this makes sense. Any assistance is much appreciated.
Regards,
-M@

Re: Mapping a reference table

Posted by Thejas M Nair <te...@yahoo-inc.com>.
The mapping will have to be done in a udf. The udf would return a bag of
tuples.

Pig query would look like this -

mapped_tuples = foreach input generate FLATTEN mapudf(bagcol);

In pig 0.8 (to be released in few days), you can also write your udfs in
python  - http://wiki.apache.org/pig/UDFsUsingScriptingLanguages

Thanks,
Thejas



On 11/29/10 2:51 PM, "Matt Tanquary" <ma...@gmail.com> wrote:

> I have this problem which I solved easily with M/R but I'm trying to solve
> through PIG instead:
> 
> Given the following bags, perform a lookup in a special table to retrieve 4
> additional variations of the data:
> {(10), (15)}
> {(5}
> {(5), (10), (15)}
> 
> Lookup table:
> 5 15 30 8 2
> 10 125 135 13 3
> 15 4 90 10 1
> 
> Note the lookup table has 5 columns, 1 for each level. The bags are given as
> level 1 data, so you will find that value in the first column of the lookup.
> Now, for the fun part: Need to create new bags for each level based on the
> given level 1 data. For instance:
> 
> {(10), (15)} IN would yield the additional bags:
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
> 
> additionally:
> {(5)} IN would yield:
> {(15)}
> {(30)}
> {(8)}
> {(2)}
> 
> So, this is the final big picture:
> Records IN:
> {(10), (15)}
> {(5)}
> 
> Records OUT:
> {(10), (15)}
> {(125), (4)}
> {(135), (90)}
> {(13), (10)}
> {(3), (1)}
> {(5)}
> {(15)}
> {(30)}
> {(8)}
> {(2)}
> 
> The cases where there is only one item in a bag is simple, but when more
> than one are introduced I am unable to determine an efficient way to tackle
> this. As a side note, I will probably only need to process up to 3 items in
> a bag in this manner.
> 
> I hope this makes sense. Any assistance is much appreciated.
> Regards,
> -M@
>