You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by guoyun <mi...@gmail.com> on 2012/03/06 04:19:45 UTC

about distinct

Dear All:
	this is the description of wiki about distinct:

	grunt> A = load 'mydata' using PigStorage() as (a, b, c);
	grunt>B = group A by a;
	grunt> C = foreach B {
		D = distinct A.b;
		generate flatten(group), COUNT(D);
	}
	
	but if filed b have sub fileds,for example:
	A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
	
	if i want to distinct D = distinct A.b.b1,how can i do?because pig is
not allowed to use D = distinct A.b.b1;
	
	Thank you!




Re: about distinct

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Thejas, I don't think nested foreaches are in 8. They are only in trunk iirc.

On Thu, Mar 15, 2012 at 3:46 PM, Thejas Nair <th...@hortonworks.com> wrote:
> On 3/13/12 9:02 PM, guoyun wrote:
>
>>>
>>>
>>> You need to use another nested foreach statement. -
>>>
>>>   C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
>>> generate flatten(group), COUNT(D);}
>>>
>>> -Thejas
>>>
>>
>> Thanks,but it is not support pig 0.8.0?
>>
>>
>
> It should work in 0.8. Do you get some error ?
>
> Thanks,
> Thejas

Re: about distinct

Posted by Thejas Nair <th...@hortonworks.com>.
On 3/13/12 9:02 PM, guoyun wrote:

>>
>>
>> You need to use another nested foreach statement. -
>>
>>    C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG;
>> generate flatten(group), COUNT(D);}
>>
>> -Thejas
>>
>
> Thanks,but it is not support pig 0.8.0?
>
>

It should work in 0.8. Do you get some error ?

Thanks,
Thejas

Re: about distinct

Posted by guoyun <mi...@gmail.com>.
> On 3/5/12 7:19 PM, guoyun wrote:
> > Dear All:
> > 	this is the description of wiki about distinct:
> >
> > 	grunt>  A = load 'mydata' using PigStorage() as (a, b, c);
> > 	grunt>B = group A by a;
> > 	grunt>  C = foreach B {
> > 		D = distinct A.b;
> > 		generate flatten(group), COUNT(D);
> > 	}
> > 	
> > 	but if filed b have sub fileds,for example:
> > 	A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
> > 	
> > 	if i want to distinct D = distinct A.b.b1,how can i do?because pig is
> > not allowed to use D = distinct A.b.b1;
> > 	
> > 	Thank you!
> >
> >
> >
> 
> 
> You need to use another nested foreach statement. -
> 
>   C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG; 
> generate flatten(group), COUNT(D);}
> 
> -Thejas
> 

Thanks,but it is not support pig 0.8.0?



Re: about distinct

Posted by Thejas Nair <th...@hortonworks.com>.
On 3/5/12 7:19 PM, guoyun wrote:
> Dear All:
> 	this is the description of wiki about distinct:
>
> 	grunt>  A = load 'mydata' using PigStorage() as (a, b, c);
> 	grunt>B = group A by a;
> 	grunt>  C = foreach B {
> 		D = distinct A.b;
> 		generate flatten(group), COUNT(D);
> 	}
> 	
> 	but if filed b have sub fileds,for example:
> 	A = load 'mydata' using PigStorage() as (a, b(b1,b2,b3), c);
> 	
> 	if i want to distinct D = distinct A.b.b1,how can i do?because pig is
> not allowed to use D = distinct A.b.b1;
> 	
> 	Thank you!
>
>
>


You need to use another nested foreach statement. -

  C = foreach B { B1BAG = foreach A generate b.b1; D = distinct B1BAG; 
generate flatten(group), COUNT(D);}

-Thejas