You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Scott Wine <sc...@whitepages.com> on 2010/06/16 22:14:16 UTC

DISTINCT BAG

Hi,

I am trying to find a way to return only DISTINCT values within a bag.  Any ideas?

Thanks
Scott

A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = GROUP C BY id;
DUMP D;
(1,{(1,M),(1,M),(1,N)})
(2,{(2,I),(2,I)})
(3,{(3,T),(3,T),(3,T)})
(4,{(4,R),(4,I)})

E =   **NEED TO DISTINCT BAG***

DESIRED E OUTPUT
(1,{(1,M),(1,N)})
(2,{(2,I)})
(3,{(3,T)})
(4,{(4,R),(4,I)})





RE: DISTINCT BAG

Posted by Scott Wine <sc...@whitepages.com>.
Thanks!

A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = DISTINCT C;
E = GROUP D BY id;

-----Original Message-----
From: Brian Adams [mailto:brian.adams@chacha.com] 
Sent: Wednesday, June 16, 2010 1:18 PM
To: pig-user@hadoop.apache.org
Subject: Re: DISTINCT BAG

http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT

?

On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote:
> Hi,
> 
> I am trying to find a way to return only DISTINCT values within a bag.  Any ideas?
> 
> Thanks
> Scott
> 
> A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
> B = FILTER A BY mname IS NOT NULL;
> C = FOREACH B GENERATE id, mname;
> D = GROUP C BY id;
> DUMP D;
> (1,{(1,M),(1,M),(1,N)})
> (2,{(2,I),(2,I)})
> (3,{(3,T),(3,T),(3,T)})
> (4,{(4,R),(4,I)})
> 
> E =   **NEED TO DISTINCT BAG***
> 
> DESIRED E OUTPUT
> (1,{(1,M),(1,N)})
> (2,{(2,I)})
> (3,{(3,T)})
> (4,{(4,R),(4,I)})
> 
> 
> 
> 

Re: DISTINCT BAG

Posted by Brian Adams <br...@chacha.com>.
http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT

?

On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote:
> Hi,
> 
> I am trying to find a way to return only DISTINCT values within a bag.  Any ideas?
> 
> Thanks
> Scott
> 
> A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
> B = FILTER A BY mname IS NOT NULL;
> C = FOREACH B GENERATE id, mname;
> D = GROUP C BY id;
> DUMP D;
> (1,{(1,M),(1,M),(1,N)})
> (2,{(2,I),(2,I)})
> (3,{(3,T),(3,T),(3,T)})
> (4,{(4,R),(4,I)})
> 
> E =   **NEED TO DISTINCT BAG***
> 
> DESIRED E OUTPUT
> (1,{(1,M),(1,N)})
> (2,{(2,I)})
> (3,{(3,T)})
> (4,{(4,R),(4,I)})
> 
> 
> 
>