You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Leo Alekseyev <dn...@gmail.com> on 2009/08/31 10:19:10 UTC
Is it possible to convert bag into a tuple?
Context: I am trying to group data like so:
grunt> cat test.dat
1 2 3
1 2 4
1 2 5
2 3 0
2 3 8
A = load 'test.dat' as (f1:int, f2:int, f3:int);
B = group A by (f1, f2);
C = foreach B generate group, A.f3;
store C into 'testc.dat' using PigStorage('\t');
grunt> cat testc.dat
(1,2) {(3),(4),(5)}
(2,3) {(0),(8)}
This form of the output makes sense b/c the second field of C is a bag
of (singleton) tuples:
grunt> describe C
C: {group: (f1: int,f2: int),f3: {f3: int}}
However, for further processing it would be more convenient for me to
have the groups output as a list of comma-separated values -- which is
how it would be written if the values from A.f3 were put into a tuple
not a bag. Is there a way to make the "foreach B generate group,
A.f3;" statement generate a tuple, not a bag? Alternatively, is there
a way to convert the bag of singleton tuples to a tuple?.. I tried
playing around with FLATTEN statement but it seemed to undo the
results of GROUP for me...
Thanks!
--Leo
Re: Is it possible to convert bag into a tuple?
Posted by zhang jianfeng <zj...@gmail.com>.
Hi Leo,
You can write a UDF to convert bag to tuple.
On Mon, Aug 31, 2009 at 1:19 AM, Leo Alekseyev <dn...@gmail.com> wrote:
> Context: I am trying to group data like so:
> grunt> cat test.dat
> 1 2 3
> 1 2 4
> 1 2 5
> 2 3 0
> 2 3 8
>
> A = load 'test.dat' as (f1:int, f2:int, f3:int);
> B = group A by (f1, f2);
> C = foreach B generate group, A.f3;
> store C into 'testc.dat' using PigStorage('\t');
>
> grunt> cat testc.dat
> (1,2) {(3),(4),(5)}
> (2,3) {(0),(8)}
>
> This form of the output makes sense b/c the second field of C is a bag
> of (singleton) tuples:
> grunt> describe C
> C: {group: (f1: int,f2: int),f3: {f3: int}}
>
> However, for further processing it would be more convenient for me to
> have the groups output as a list of comma-separated values -- which is
> how it would be written if the values from A.f3 were put into a tuple
> not a bag. Is there a way to make the "foreach B generate group,
> A.f3;" statement generate a tuple, not a bag? Alternatively, is there
> a way to convert the bag of singleton tuples to a tuple?.. I tried
> playing around with FLATTEN statement but it seemed to undo the
> results of GROUP for me...
> Thanks!
> --Leo
>