You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Leo Alekseyev <dn...@gmail.com> on 2009/08/31 10:19:10 UTC

Is it possible to convert bag into a tuple?

Context: I am trying to group data like so:
grunt> cat test.dat
1       2       3
1       2       4
1       2       5
2       3       0
2       3       8

A = load 'test.dat' as (f1:int, f2:int, f3:int);
B = group A by (f1, f2);
C = foreach B generate group, A.f3;
store C into 'testc.dat' using PigStorage('\t');

grunt> cat testc.dat
(1,2)   {(3),(4),(5)}
(2,3)   {(0),(8)}

This form of the output makes sense b/c the second field of C is a bag
of (singleton) tuples:
grunt> describe C
C: {group: (f1: int,f2: int),f3: {f3: int}}

However, for further processing it would be more convenient for me to
have the groups output as a list of comma-separated values -- which is
how it would be written if the values from A.f3 were put into a tuple
not a bag.  Is there a way to make the "foreach B generate group,
A.f3;" statement generate a tuple, not a bag?  Alternatively, is there
a way to convert the bag of singleton tuples to a tuple?..  I tried
playing around with FLATTEN statement but it seemed to undo the
results of GROUP for me...
Thanks!
--Leo

Re: Is it possible to convert bag into a tuple?

Posted by zhang jianfeng <zj...@gmail.com>.
Hi Leo,

You can write a UDF to convert bag to tuple.



On Mon, Aug 31, 2009 at 1:19 AM, Leo Alekseyev <dn...@gmail.com> wrote:

> Context: I am trying to group data like so:
> grunt> cat test.dat
> 1       2       3
> 1       2       4
> 1       2       5
> 2       3       0
> 2       3       8
>
> A = load 'test.dat' as (f1:int, f2:int, f3:int);
> B = group A by (f1, f2);
> C = foreach B generate group, A.f3;
> store C into 'testc.dat' using PigStorage('\t');
>
> grunt> cat testc.dat
> (1,2)   {(3),(4),(5)}
> (2,3)   {(0),(8)}
>
> This form of the output makes sense b/c the second field of C is a bag
> of (singleton) tuples:
> grunt> describe C
> C: {group: (f1: int,f2: int),f3: {f3: int}}
>
> However, for further processing it would be more convenient for me to
> have the groups output as a list of comma-separated values -- which is
> how it would be written if the values from A.f3 were put into a tuple
> not a bag.  Is there a way to make the "foreach B generate group,
> A.f3;" statement generate a tuple, not a bag?  Alternatively, is there
> a way to convert the bag of singleton tuples to a tuple?..  I tried
> playing around with FLATTEN statement but it seemed to undo the
> results of GROUP for me...
> Thanks!
> --Leo
>