You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Mix Nin <pi...@gmail.com> on 2013/03/07 01:41:38 UTC

FLATTEN is not working

I have a file with below data

xxxxx   11,22,33        44,55,66        77,88,99

I wrote below PIG  script

X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc
:chararray ,qt :chararray ,qe :chararray  );

Y = Foreach X generate id, STRSPLIT(qc,',') AS split_qc  , STRSPLIT(qt,',')
AS  split_qt, STRSPLIT(qe,',') AS split_qe;;

Z = foreach Y generate id, FLATTEN(TOBAG(split_qc));

I expected output as follows:

xxxxx 11
xxxxx 22
xxxxx 33

But the above script is producing output as follows

(xxxxx,11,22,33)

FLATTEN is not actually flattening the bag of tuple. Any inputs here?

- Thanks

Re: FLATTEN is not working

Posted by Mix Nin <pi...@gmail.com>.
I used below script and got the desired output. Thanks for the reply

A =foreach Z generate $0 as id, FLATTEN(TOBAG(*)) as value;

I have another question

Currently the input is as below
xxxxx   11,22,33        44,55,66        77,88,99

Suppose if input is as below

xxxxx  11,22,33         44,55,66        77,88,99
yyyyy  12,23              34,45            56,67
zzzzz  1,2,3,4            5,6,7,8,9       66,77,88,99

And the output needs to be as follows

xxxx     11 44 77
xxxx     22 55 88
xxxx     33 66 99
yyyy     12 34 56
yyyy     23 45 67
zzzz     1   5   66
zzzz     2   6   77
zzzz     3   7   88
zzzz     4   8   99

So basically, input can have variable values in each filed. How can we
replace the script?




On Thu, Mar 7, 2013 at 7:03 AM, Mix Nin <pi...@gmail.com> wrote:

> Hi Harsha,
>
> I am getting output as  below with the new script. It is not transposed
>
> (xxxxx,(11,44,77),(22,55,88),(33,66,99))
>
>
> Also , there is no guarantee that in input that there would be only 3
> values in each field separated by comma(,). There can be variable number of
> values.
>
> Thanks
>
>
>

Re: FLATTEN is not working

Posted by Mix Nin <pi...@gmail.com>.
Hi Harsha,

I am getting output as  below with the new script. It is not transposed

(xxxxx,(11,44,77),(22,55,88),(33,66,99))


Also , there is no guarantee that in input that there would be only 3
values in each field separated by comma(,). There can be variable number of
values.

Thanks




Re: FLATTEN is not working

Posted by Harsha <ha...@defun.org>.
I can think off doing some thing on these lines but there might be a better way. 
 Z = foreach Y generate id, TOTUPLE(split_qc.$0,split_qt.$0,split_qe.$0),TOTUPLE(split_qc.$1,split_qt.$1,split_qe.$1),TOTUPLE(split_qc.$2,split_qt.$2,split_qe.$2);
A = foreach Z generate $0, flatten(TOBAG($1,$2,$3));

-- 
Harsha


On Wednesday, March 6, 2013 at 5:46 PM, Mix Nin wrote:

> Harsha, Thanks for the reply. Suppose if I want to see output as follows
> xxxxx 11 44 77
> xxxxx 22 55 88
> xxxxx 33 66 99
> 
> How would the script be written
> 
> 
> On Wed, Mar 6, 2013 at 5:29 PM, Harsha <harsha@defun.org (mailto:harsha@defun.org)> wrote:
> 
> > Hi Mix,
> > You are doing a TOBAG on a tuple which will put it as
> > {((11,22,33))}.
> > flatten the tuple before doing the TOBAG.
> > Z = foreach Y GENERATE id ,flatten(split_qc);
> > A = foreach Z generate $0, flatten(TOBAG($1,$2,$3));
> > --
> > Harsha
> > 
> > 
> > On Wednesday, March 6, 2013 at 4:41 PM, Mix Nin wrote:
> > 
> > > I have a file with below data
> > > 
> > > xxxxx 11,22,33 44,55,66 77,88,99
> > > 
> > > I wrote below PIG script
> > > 
> > > X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc
> > > :chararray ,qt :chararray ,qe :chararray );
> > > 
> > > Y = Foreach X generate id, STRSPLIT(qc,',') AS split_qc ,
> > STRSPLIT(qt,',')
> > > AS split_qt, STRSPLIT(qe,',') AS split_qe;;
> > > 
> > > Z = foreach Y generate id, FLATTEN(TOBAG(split_qc));
> > > 
> > > I expected output as follows:
> > > 
> > > xxxxx 11
> > > xxxxx 22
> > > xxxxx 33
> > > 
> > > But the above script is producing output as follows
> > > 
> > > (xxxxx,11,22,33)
> > > 
> > > FLATTEN is not actually flattening the bag of tuple. Any inputs here?
> > > 
> > > - Thanks 


Re: FLATTEN is not working

Posted by Mix Nin <pi...@gmail.com>.
Harsha, Thanks for the reply. Suppose if I want to see output as follows
xxxxx 11 44 77
xxxxx 22 55 88
xxxxx 33 66 99

How would the script be written


On Wed, Mar 6, 2013 at 5:29 PM, Harsha <ha...@defun.org> wrote:

> Hi Mix,
>        You are doing a TOBAG on a tuple which will put  it as
> {((11,22,33))}.
> flatten the tuple before doing the TOBAG.
>  Z = foreach Y GENERATE id ,flatten(split_qc);
> A = foreach Z generate $0, flatten(TOBAG($1,$2,$3));
> --
> Harsha
>
>
> On Wednesday, March 6, 2013 at 4:41 PM, Mix Nin wrote:
>
> > I have a file with below data
> >
> > xxxxx 11,22,33 44,55,66 77,88,99
> >
> > I wrote below PIG script
> >
> > X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc
> > :chararray ,qt :chararray ,qe :chararray );
> >
> > Y = Foreach X generate id, STRSPLIT(qc,',') AS split_qc ,
> STRSPLIT(qt,',')
> > AS split_qt, STRSPLIT(qe,',') AS split_qe;;
> >
> > Z = foreach Y generate id, FLATTEN(TOBAG(split_qc));
> >
> > I expected output as follows:
> >
> > xxxxx 11
> > xxxxx 22
> > xxxxx 33
> >
> > But the above script is producing output as follows
> >
> > (xxxxx,11,22,33)
> >
> > FLATTEN is not actually flattening the bag of tuple. Any inputs here?
> >
> > - Thanks
>
>

Re: FLATTEN is not working

Posted by Harsha <ha...@defun.org>.
Hi Mix, 
       You are doing a TOBAG on a tuple which will put  it as {((11,22,33))}. 
flatten the tuple before doing the TOBAG.
 Z = foreach Y GENERATE id ,flatten(split_qc); 
A = foreach Z generate $0, flatten(TOBAG($1,$2,$3)); 
--
Harsha


On Wednesday, March 6, 2013 at 4:41 PM, Mix Nin wrote:

> I have a file with below data
> 
> xxxxx 11,22,33 44,55,66 77,88,99
> 
> I wrote below PIG script
> 
> X= LOAD '/user/lnindrakrishna/tmp/ExpTag.txt' AS (id :chararray,qc
> :chararray ,qt :chararray ,qe :chararray );
> 
> Y = Foreach X generate id, STRSPLIT(qc,',') AS split_qc , STRSPLIT(qt,',')
> AS split_qt, STRSPLIT(qe,',') AS split_qe;;
> 
> Z = foreach Y generate id, FLATTEN(TOBAG(split_qc));
> 
> I expected output as follows:
> 
> xxxxx 11
> xxxxx 22
> xxxxx 33
> 
> But the above script is producing output as follows
> 
> (xxxxx,11,22,33)
> 
> FLATTEN is not actually flattening the bag of tuple. Any inputs here?
> 
> - Thanks