You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by zhang jianfeng <zj...@gmail.com> on 2009/04/02 07:18:07 UTC

What operators can I use in foreach nested block?

I use the group operator in the foreach nested block, it seems like pig do
not support this. The first group is conflicted with the second group, is
there any way can resolve this issue? Does pig support keyword alias?

My script:

uidGroup = GROUP raw BY uid;

result = FOREACH uidGroup {
    B = GROUP raw BY mfd_id;
    GENERATE group;
}

Re: What operators can I use in foreach nested block?

Posted by Kevin Weil <ke...@gmail.com>.
Perhaps you can get what you're after by grouping by both columns at once:

b = GROUP raw BY (uid, mfd_id);

Kevin

On Wed, Apr 1, 2009 at 10:43 PM, zhang jianfeng <zj...@gmail.com> wrote:

> I'd like to do two levels of group operation, that means group by field f1
> and then group by field f2.
>
> I've found a way to solve this problem using UDF, UDF is really a powerful
> feature. But if pig can support the group operation in nested block, I
> think
> it should be better.
>
>
>
> On Thu, Apr 2, 2009 at 1:24 PM, Kevin Weil <ke...@gmail.com> wrote:
>
> > "The specific operations that we can do on the nested relations are
> > FILTER<
> >
> http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
> > >,
> > ORDER<
> >
> http://wiki.apache.org/pig/PigLatin#ORDER:_Sorting_data_according_to_some_fields
> > >,
> > and DISTINCT<
> >
> http://wiki.apache.org/pig/PigLatin#DISTINCT:_Eliminating_duplicates_in_data
> > >.
> > Note that we do not allow FOREACH...GENERATE on the nested relation,
> since
> > that leads to the possibility of arbitrary number of nesting levels."
> > See the section on nested operations at
> > http://wiki.apache.org/pig/PigLatinfor more.  You can also call UDFs
> > like COUNT inside a nested foreach.
> >
> > What exactly are you trying to do with your data?  If you give a short
> > example, I'm sure there's an appropriate way to do it in Pig.
> >
> > Kevin
> >
> > On Wed, Apr 1, 2009 at 10:18 PM, zhang jianfeng <zj...@gmail.com>
> wrote:
> >
> > > I use the group operator in the foreach nested block, it seems like pig
> > do
> > > not support this. The first group is conflicted with the second group,
> is
> > > there any way can resolve this issue? Does pig support keyword alias?
> > >
> > > My script:
> > >
> > > uidGroup = GROUP raw BY uid;
> > >
> > > result = FOREACH uidGroup {
> > >    B = GROUP raw BY mfd_id;
> > >    GENERATE group;
> > > }
> > >
> >
>

Re: What operators can I use in foreach nested block?

Posted by Alan Gates <ga...@yahoo-inc.com>.
Our goal is to eventually support all pig operated in a nested block,  
but we haven't gotten there yet.

Alan.

On Apr 1, 2009, at 10:43 PM, zhang jianfeng wrote:

> I'd like to do two levels of group operation, that means group by  
> field f1
> and then group by field f2.
>
> I've found a way to solve this problem using UDF, UDF is really a  
> powerful
> feature. But if pig can support the group operation in nested block,  
> I think
> it should be better.
>
>
>
> On Thu, Apr 2, 2009 at 1:24 PM, Kevin Weil <ke...@gmail.com>  
> wrote:
>
>> "The specific operations that we can do on the nested relations are
>> FILTER<
>> http://wiki.apache.org/pig/ 
>> PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
>>> ,
>> ORDER<
>> http://wiki.apache.org/pig/ 
>> PigLatin#ORDER:_Sorting_data_according_to_some_fields
>>> ,
>> and DISTINCT<
>> http://wiki.apache.org/pig/ 
>> PigLatin#DISTINCT:_Eliminating_duplicates_in_data
>>> .
>> Note that we do not allow FOREACH...GENERATE on the nested  
>> relation, since
>> that leads to the possibility of arbitrary number of nesting levels."
>> See the section on nested operations at
>> http://wiki.apache.org/pig/PigLatinfor more.  You can also call UDFs
>> like COUNT inside a nested foreach.
>>
>> What exactly are you trying to do with your data?  If you give a  
>> short
>> example, I'm sure there's an appropriate way to do it in Pig.
>>
>> Kevin
>>
>> On Wed, Apr 1, 2009 at 10:18 PM, zhang jianfeng <zj...@gmail.com>  
>> wrote:
>>
>>> I use the group operator in the foreach nested block, it seems  
>>> like pig
>> do
>>> not support this. The first group is conflicted with the second  
>>> group, is
>>> there any way can resolve this issue? Does pig support keyword  
>>> alias?
>>>
>>> My script:
>>>
>>> uidGroup = GROUP raw BY uid;
>>>
>>> result = FOREACH uidGroup {
>>>   B = GROUP raw BY mfd_id;
>>>   GENERATE group;
>>> }
>>>
>>


Re: What operators can I use in foreach nested block?

Posted by zhang jianfeng <zj...@gmail.com>.
I'd like to do two levels of group operation, that means group by field f1
and then group by field f2.

I've found a way to solve this problem using UDF, UDF is really a powerful
feature. But if pig can support the group operation in nested block, I think
it should be better.



On Thu, Apr 2, 2009 at 1:24 PM, Kevin Weil <ke...@gmail.com> wrote:

> "The specific operations that we can do on the nested relations are
> FILTER<
> http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_
> >,
> ORDER<
> http://wiki.apache.org/pig/PigLatin#ORDER:_Sorting_data_according_to_some_fields
> >,
> and DISTINCT<
> http://wiki.apache.org/pig/PigLatin#DISTINCT:_Eliminating_duplicates_in_data
> >.
> Note that we do not allow FOREACH...GENERATE on the nested relation, since
> that leads to the possibility of arbitrary number of nesting levels."
> See the section on nested operations at
> http://wiki.apache.org/pig/PigLatinfor more.  You can also call UDFs
> like COUNT inside a nested foreach.
>
> What exactly are you trying to do with your data?  If you give a short
> example, I'm sure there's an appropriate way to do it in Pig.
>
> Kevin
>
> On Wed, Apr 1, 2009 at 10:18 PM, zhang jianfeng <zj...@gmail.com> wrote:
>
> > I use the group operator in the foreach nested block, it seems like pig
> do
> > not support this. The first group is conflicted with the second group, is
> > there any way can resolve this issue? Does pig support keyword alias?
> >
> > My script:
> >
> > uidGroup = GROUP raw BY uid;
> >
> > result = FOREACH uidGroup {
> >    B = GROUP raw BY mfd_id;
> >    GENERATE group;
> > }
> >
>

Re: What operators can I use in foreach nested block?

Posted by Kevin Weil <ke...@gmail.com>.
"The specific operations that we can do on the nested relations are
FILTER<http://wiki.apache.org/pig/PigLatin#FILTER:_Getting_rid_of_data_you_are_not_interested_in_>,
ORDER<http://wiki.apache.org/pig/PigLatin#ORDER:_Sorting_data_according_to_some_fields>,
and DISTINCT<http://wiki.apache.org/pig/PigLatin#DISTINCT:_Eliminating_duplicates_in_data>.
Note that we do not allow FOREACH...GENERATE on the nested relation, since
that leads to the possibility of arbitrary number of nesting levels."
See the section on nested operations at
http://wiki.apache.org/pig/PigLatinfor more.  You can also call UDFs
like COUNT inside a nested foreach.

What exactly are you trying to do with your data?  If you give a short
example, I'm sure there's an appropriate way to do it in Pig.

Kevin

On Wed, Apr 1, 2009 at 10:18 PM, zhang jianfeng <zj...@gmail.com> wrote:

> I use the group operator in the foreach nested block, it seems like pig do
> not support this. The first group is conflicted with the second group, is
> there any way can resolve this issue? Does pig support keyword alias?
>
> My script:
>
> uidGroup = GROUP raw BY uid;
>
> result = FOREACH uidGroup {
>    B = GROUP raw BY mfd_id;
>    GENERATE group;
> }
>

RE: What operators can I use in foreach nested block?

Posted by zjffdu <zj...@gmail.com>.
Yes, actually, I want to do some other operation on B, like filter, and foreach.



-----Original Message-----
From: Mridul Muralidharan [mailto:mridulm@yahoo-inc.com] 
Sent: Thursday, April 02, 2009 7:10 PM
To: pig-user@hadoop.apache.org
Subject: Re: What operators can I use in foreach nested block?

zhang jianfeng wrote:
> I use the group operator in the foreach nested block, it seems like pig do
> not support this. The first group is conflicted with the second group, is
> there any way can resolve this issue? Does pig support keyword alias?
> 
> My script:
> 
> uidGroup = GROUP raw BY uid;
> 
> result = FOREACH uidGroup {
>     B = GROUP raw BY mfd_id;
>     GENERATE group;
> }


Isn't this not similar to :

uidGroup = GROUP raw BY uid;

result = FOREACH uidGroup {
   B  = DISTINCT raw.mfd_id;
   GENERATE B;
}

?

But possibly the above was a simplified form of the actual query you 
want to run ?


> 


Re: What operators can I use in foreach nested block?

Posted by Mridul Muralidharan <mr...@yahoo-inc.com>.
zhang jianfeng wrote:
> I use the group operator in the foreach nested block, it seems like pig do
> not support this. The first group is conflicted with the second group, is
> there any way can resolve this issue? Does pig support keyword alias?
> 
> My script:
> 
> uidGroup = GROUP raw BY uid;
> 
> result = FOREACH uidGroup {
>     B = GROUP raw BY mfd_id;
>     GENERATE group;
> }


Isn't this not similar to :

uidGroup = GROUP raw BY uid;

result = FOREACH uidGroup {
   B  = DISTINCT raw.mfd_id;
   GENERATE B;
}

?

But possibly the above was a simplified form of the actual query you 
want to run ?


>