You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Haitao Yao <ya...@gmail.com> on 2013/04/03 05:34:06 UTC
optimization for data cube
Hi, all
I have a tuple like this:
(group_a,group_b,group_c,value)
and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
(all,all,all,value)
(group_a,all,all,value)
(all,group_b,all,value)
(group_a,group_b,all,value)
(all,all,group_c,value)
(group_a,all,group_c,value)
(all,group_b,group_c,value)
and then group by ($0, $1, $2) .
How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
thanks.
Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype: haitao.yao.final
Re: optimization for data cube
Posted by Haitao Yao <ya...@gmail.com>.
Thank you very much.
We're using Pig-0.9.2. I updated to 0.11 but it took an unacceptable time to compile my big pig script. With Pig-0.9.2, it's OK. I still did not find the reason.
So, I think I need migrate the cube operation to 0.9.2 by myself.
Haitao Yao
yao.erix@gmail.com
weibo: @haitao_yao
Skype: haitao.yao.final
在 2013-4-3,下午1:19,Prasanth J <bu...@gmail.com> 写道:
> From 0.11 release onwards Pig natively supports CUBE operator.
>
> Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube
>
> For your case the query can be represented as
>
> cubed = CUBE input BY CUBE(group_a,group_b,group_c);
> output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;
>
> More examples can be found in documentation.
>
> Thanks
> -- Prasanth
>
> On Apr 2, 2013, at 11:34 PM, Haitao Yao <ya...@gmail.com> wrote:
>
>> Hi, all
>> I have a tuple like this:
>> (group_a,group_b,group_c,value)
>>
>> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>>
>> (all,all,all,value)
>> (group_a,all,all,value)
>> (all,group_b,all,value)
>> (group_a,group_b,all,value)
>> (all,all,group_c,value)
>> (group_a,all,group_c,value)
>> (all,group_b,group_c,value)
>>
>> and then group by ($0, $1, $2) .
>> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>>
>>
>> thanks.
>>
>>
>> Haitao Yao
>> yao.erix@gmail.com
>> weibo: @haitao_yao
>> Skype: haitao.yao.final
>>
>
Re: optimization for data cube
Posted by Prasanth J <bu...@gmail.com>.
From 0.11 release onwards Pig natively supports CUBE operator.
Here is the documentation for CUBE operator http://pig.apache.org/docs/r0.11.1/basic.html#cube
For your case the query can be represented as
cubed = CUBE input BY CUBE(group_a,group_b,group_c);
output = FOREACH cubed GENERATE FLATTEN(group) as (group_a,group_b,group_c), FLATTEN(cube.value) as value;
More examples can be found in documentation.
Thanks
-- Prasanth
On Apr 2, 2013, at 11:34 PM, Haitao Yao <ya...@gmail.com> wrote:
> Hi, all
> I have a tuple like this:
> (group_a,group_b,group_c,value)
>
> and I want to calculate the values in a data cube way, which means I want to generate new tuples from the original one :
>
> (all,all,all,value)
> (group_a,all,all,value)
> (all,group_b,all,value)
> (group_a,group_b,all,value)
> (all,all,group_c,value)
> (group_a,all,group_c,value)
> (all,group_b,group_c,value)
>
> and then group by ($0, $1, $2) .
> How can I do this? I've wrote a Eval function, but it can not generate more tuples from one tuple.
>
>
> thanks.
>
>
> Haitao Yao
> yao.erix@gmail.com
> weibo: @haitao_yao
> Skype: haitao.yao.final
>