You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Marek Miglinski <mm...@seven.com> on 2011/09/13 19:54:41 UTC

Dumb question guys

Hey all, 4 hours of true torture, hope you will help me (the task is easy)

up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray,  upKeyword:chararray);
tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);

recordExtract = FOREACH recordGroup {
                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
                recordLimited = LIMIT recordFiltered 1;
                GENERATE
                                recordLimited
                ;
}

How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...

Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"

Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"

RE: Dumb question guys

Posted by Marek Miglinski <mm...@seven.com>.
Ok... I've done it :P Thanks for your help, done it through JOIN with the help of the new key field (that consist of txUser and txEpoch) that I use later to identify unique fields for GROUPing.


Sincerely,
Marek M.
________________________________________
From: Marek Miglinski [mmiglinski@seven.com]
Sent: Wednesday, September 14, 2011 9:52 AM
To: user@pig.apache.org
Subject: RE: Dumb question guys

Thanks for your reply,

I can't use JOIN and I will explain why. So here I have data...
UP:
9,user1,sam1
5,user1,sam2
3,user1,sam3
9,user2,flin

TX:
7,user1,wow
9,user2,pop

I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user):
7,user1,wow,9,user1,sam1
7,user1,wow,5,user1,sam2
7,user1,wow,3,user1,sam3
9,user2,pop,9,user2,flin

Now, I can't filter the records properly in FOREACH, because I don't know if current input row is what I need, ok?

So I do COGROUP and get:
{(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)}
{(9,user2,pop)}, {(9,user2,flin)}

Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in one row:

recordExtract = FOREACH recordGroup {
                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
                recordOrdered = ORDER recordFiltered by upEpoch DESC;
                recordLimited = LIMIT recordOrdered 1;
                GENERATE
                                recordLimited
                ;
}

So if I get tx.txEpoch properly I will get the desired:
7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7)
9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9)


Do you have any clues?

________________________________________
From: Xiaomeng Wan [shawnwan@gmail.com]
Sent: Tuesday, September 13, 2011 11:26 PM
To: user@pig.apache.org
Subject: Re: Dumb question guys

tx is a bag, you can not use it in that way unless it is a scalar. Not
sure about the logic here, but looks like you should use a join rather
than a cogroup

recordGroup = join up BY upInstance, tx BY txInstance;
recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

Shawn

On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski <mm...@seven.com> wrote:
> Hey all, 4 hours of true torture, hope you will help me (the task is easy)
>
> up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray,  upKeyword:chararray);
> tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
> recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);
>
> recordExtract = FOREACH recordGroup {
>                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
>                recordLimited = LIMIT recordFiltered 1;
>                GENERATE
>                                recordLimited
>                ;
> }
>
> How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...
>
> Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"
>
> Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"
>

RE: Dumb question guys

Posted by Marek Miglinski <mm...@seven.com>.
Thanks for your reply,

I can't use JOIN and I will explain why. So here I have data...
UP:
9,user1,sam1
5,user1,sam2
3,user1,sam3
9,user2,flin

TX:
7,user1,wow
9,user2,pop

I need to join tx with up by user and closest epoch (first field). If I do JOIN I will get (JOIN BY user):
7,user1,wow,9,user1,sam1
7,user1,wow,5,user1,sam2
7,user1,wow,3,user1,sam3
9,user2,pop,9,user2,flin

Now, I can't filter the records properly in FOREACH, because I don't know if current input row is what I need, ok?

So I do COGROUP and get:
{(7,user1,wow)}, {(9,user1,sam1), (5,user1,sam2), (3,user1,sam2)}
{(9,user2,pop)}, {(9,user2,flin)}

Now I can FILTER, ORDER and LIMIT through FOREACH because I have all data in one row:

recordExtract = FOREACH recordGroup {
                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
                recordOrdered = ORDER recordFiltered by upEpoch DESC;
                recordLimited = LIMIT recordOrdered 1;
                GENERATE
                                recordLimited
                ;
}

So if I get tx.txEpoch properly I will get the desired:
7,user1,wow,5,user1,sam2 (txEpoch 5 is closest to upEpoch 7)
9,user2,pop,9,user2,flin (txEpoch 9 is closest to upEpoch 9)


Do you have any clues?

________________________________________
From: Xiaomeng Wan [shawnwan@gmail.com]
Sent: Tuesday, September 13, 2011 11:26 PM
To: user@pig.apache.org
Subject: Re: Dumb question guys

tx is a bag, you can not use it in that way unless it is a scalar. Not
sure about the logic here, but looks like you should use a join rather
than a cogroup

recordGroup = join up BY upInstance, tx BY txInstance;
recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

Shawn

On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski <mm...@seven.com> wrote:
> Hey all, 4 hours of true torture, hope you will help me (the task is easy)
>
> up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray,  upKeyword:chararray);
> tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
> recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);
>
> recordExtract = FOREACH recordGroup {
>                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
>                recordLimited = LIMIT recordFiltered 1;
>                GENERATE
>                                recordLimited
>                ;
> }
>
> How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...
>
> Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"
>
> Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"
>

Re: Dumb question guys

Posted by Xiaomeng Wan <sh...@gmail.com>.
tx is a bag, you can not use it in that way unless it is a scalar. Not
sure about the logic here, but looks like you should use a join rather
than a cogroup

recordGroup = join up BY upInstance, tx BY txInstance;
recordFiltered = FILTER recordGroup BY upEpoch < txEpoch;

Shawn

On Tue, Sep 13, 2011 at 11:54 AM, Marek Miglinski <mm...@seven.com> wrote:
> Hey all, 4 hours of true torture, hope you will help me (the task is easy)
>
> up = LOAD '/up.log' USING PigStorage(',') AS (upEpoch:long, upInstance:chararray,  upKeyword:chararray);
> tx = LOAD '/tx.log' USING PigStorage(',') AS (txEpoch:long, txInstance:chararray, txKeyword:chararray);
> recordGroup = COGROUP up BY (upInstance), tx BY (txInstance);
>
> recordExtract = FOREACH recordGroup {
>                recordFiltered = FILTER up BY upEpoch < tx.txEpoch;
>                recordLimited = LIMIT recordFiltered 1;
>                GENERATE
>                                recordLimited
>                ;
> }
>
> How do I point PIG to my tx input with txEpoch field (from recordGroup)? tx::txEpoch, tx.txEpoch, txEpoch, recordGroup::tx.txEpoch doesn't work...
>
> Always the same, with tx::txEpoch - "ERROR 1000: Error during parsing. Invalid alias: tx::txEpoch in {upEpoch: long,upInstance: chararray,upKeyword: chararray}"
>
> Or with tx.txEpoch (I know it takes tx = LOAD as a source, but I need recordGroup::tx.txEpoch!) - "ERROR 2997: Unable to recreate exception from backed error: org.apache.pig.backend.executionengine.ExecException: ERROR 0: Scalar has more than one row in the output. 1st : (1314835200050,99,sam), 2nd :(1314835200079,99,flin)"
>