You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Haitao Yao <ya...@gmail.com> on 2013/04/03 05:34:06 UTC

optimization for data cube

Hi, all 
I have a tuple like this: 
(group_a,group_b,group_c,value)

and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :

(all,all,all,value)
(group_a,all,all,value)
(all,group_b,all,value)
(group_a,group_b,all,value)
(all,all,group_c,value)
(group_a,all,group_c,value)
(all,group_b,group_c,value)

and then group by ($0, $1, $2) .
How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.


thanks.


Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype:  haitao.yao.final

Re: optimization for data cube

Posted by Haitao Yao <ya...@gmail.com>.

Thank you very much. 
We're using Pig-0.9.2. I updated to 0.11 but it took an unacceptable time to compile my big pig script. With Pig-0.9.2, it's OK. I still did not find the reason.

So, I think I need migrate the cube operation to 0.9.2 by myself.


Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype:  haitao.yao.final

在 2013-4-3，下午1:19，Prasanth J <bu...@gmail.com> 写道：

> From 0.11 release onwards Pig natively supports CUBE operator.
> 
> Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube
> 
> For your case the query can be represented as
> 
> cubed = CUBE input BY CUBE(group_a,group_b,group_c);
> output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;
> 
> More examples can be found in documentation. 
> 
> Thanks
> -- Prasanth
> 
> On Apr 2, 2013, at 11:34 PM, Haitao Yao <ya...@gmail.com> wrote:
> 
>> Hi, all 
>> I have a tuple like this: 
>> (group_a,group_b,group_c,value)
>> 
>> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>> 
>> (all,all,all,value)
>> (group_a,all,all,value)
>> (all,group_b,all,value)
>> (group_a,group_b,all,value)
>> (all,all,group_c,value)
>> (group_a,all,group_c,value)
>> (all,group_b,group_c,value)
>> 
>> and then group by ($0, $1, $2) .
>> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>> 
>> 
>> thanks.
>> 
>> 
>> Haitao Yao
>> yao.erix@gmail.com
>> weibo: @haitao_yao
>> Skype:  haitao.yao.final
>> 
>

Re: optimization for data cube

Posted by Prasanth J <bu...@gmail.com>.

From 0.11 release onwards Pig natively supports CUBE operator.

Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube

For your case the query can be represented as

cubed = CUBE input BY CUBE(group_a,group_b,group_c);
output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;

More examples can be found in documentation. 

Thanks
-- Prasanth

On Apr 2, 2013, at 11:34 PM, Haitao Yao <ya...@gmail.com> wrote:

> Hi, all 
> I have a tuple like this: 
> (group_a,group_b,group_c,value)
> 
> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
> 
> (all,all,all,value)
> (group_a,all,all,value)
> (all,group_b,all,value)
> (group_a,group_b,all,value)
> (all,all,group_c,value)
> (group_a,all,group_c,value)
> (all,group_b,group_c,value)
> 
> and then group by ($0, $1, $2) .
> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
> 
> 
> thanks.
> 
> 
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype:  haitao.yao.final
>