You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Sarath <sa...@algofusiontech.com> on 2012/03/29 16:00:58 UTC

Help needed on a solution

Hi All,

I'm new to Pig and working on project which deals with huge data. We are 
using hadoop and pig for our project.
Need help in writing a pig script for the below requirement.

We are loading 2 sets of data as below -
A = load 'a.txt' using PigStorage('|') as (id: chararray, date: long, 
amount: float);
B = load 'b.txt' using PigStorage('|') as (id: chararray, date: long, 
amount: float);

Now the requirement is for each record in A find record in B that has -

  * amount = A.amount + x (which will be passed as parameter)
  * date = A.date + d (which will be passed as parameter)

Tried in the below way but getting error as "expression is not a project 
expression".

C = FOREACH A {
     C1 = FILTER B BY (B.amount == A.amount+0.01);
     GENERATE C1;
}

Requesting to suggest best approach to write pig script for the above 
requirement.

Regards,
Sarath.


Re: Help needed on a solution

Posted by Bill Graham <bi...@gmail.com>.
You can do a FOREACH on A where you add x and d to their respective values,
then do an inner join on amount and date.

On Thu, Mar 29, 2012 at 7:00 AM, Sarath <
sarathchandra.josyam@algofusiontech.com> wrote:

> Hi All,
>
> I'm new to Pig and working on project which deals with huge data. We are
> using hadoop and pig for our project.
> Need help in writing a pig script for the below requirement.
>
> We are loading 2 sets of data as below -
> A = load 'a.txt' using PigStorage('|') as (id: chararray, date: long,
> amount: float);
> B = load 'b.txt' using PigStorage('|') as (id: chararray, date: long,
> amount: float);
>
> Now the requirement is for each record in A find record in B that has -
>
>  * amount = A.amount + x (which will be passed as parameter)
>  * date = A.date + d (which will be passed as parameter)
>
> Tried in the below way but getting error as "expression is not a project
> expression".
>
> C = FOREACH A {
>    C1 = FILTER B BY (B.amount == A.amount+0.01);
>    GENERATE C1;
> }
>
> Requesting to suggest best approach to write pig script for the above
> requirement.
>
> Regards,
> Sarath.
>
>


-- 
*Note that I'm no longer using my Yahoo! email address. Please email me at
billgraham@gmail.com going forward.*