You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Scott Wine <sc...@whitepages.com> on 2010/06/16 22:14:16 UTC
DISTINCT BAG
Hi,
I am trying to find a way to return only DISTINCT values within a bag. Any ideas?
Thanks
Scott
A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = GROUP C BY id;
DUMP D;
(1,{(1,M),(1,M),(1,N)})
(2,{(2,I),(2,I)})
(3,{(3,T),(3,T),(3,T)})
(4,{(4,R),(4,I)})
E = **NEED TO DISTINCT BAG***
DESIRED E OUTPUT
(1,{(1,M),(1,N)})
(2,{(2,I)})
(3,{(3,T)})
(4,{(4,R),(4,I)})
RE: DISTINCT BAG
Posted by Scott Wine <sc...@whitepages.com>.
Thanks!
A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
B = FILTER A BY mname IS NOT NULL;
C = FOREACH B GENERATE id, mname;
D = DISTINCT C;
E = GROUP D BY id;
-----Original Message-----
From: Brian Adams [mailto:brian.adams@chacha.com]
Sent: Wednesday, June 16, 2010 1:18 PM
To: pig-user@hadoop.apache.org
Subject: Re: DISTINCT BAG
http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT
?
On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote:
> Hi,
>
> I am trying to find a way to return only DISTINCT values within a bag. Any ideas?
>
> Thanks
> Scott
>
> A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
> B = FILTER A BY mname IS NOT NULL;
> C = FOREACH B GENERATE id, mname;
> D = GROUP C BY id;
> DUMP D;
> (1,{(1,M),(1,M),(1,N)})
> (2,{(2,I),(2,I)})
> (3,{(3,T),(3,T),(3,T)})
> (4,{(4,R),(4,I)})
>
> E = **NEED TO DISTINCT BAG***
>
> DESIRED E OUTPUT
> (1,{(1,M),(1,N)})
> (2,{(2,I)})
> (3,{(3,T)})
> (4,{(4,R),(4,I)})
>
>
>
>
Re: DISTINCT BAG
Posted by Brian Adams <br...@chacha.com>.
http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#DISTINCT
?
On Wed, 2010-06-16 at 13:14 -0700, Scott Wine wrote:
> Hi,
>
> I am trying to find a way to return only DISTINCT values within a bag. Any ideas?
>
> Thanks
> Scott
>
> A = LOAD 'file.tsv' USING PigStorage() AS (id:chararray, mname:chararray);
> B = FILTER A BY mname IS NOT NULL;
> C = FOREACH B GENERATE id, mname;
> D = GROUP C BY id;
> DUMP D;
> (1,{(1,M),(1,M),(1,N)})
> (2,{(2,I),(2,I)})
> (3,{(3,T),(3,T),(3,T)})
> (4,{(4,R),(4,I)})
>
> E = **NEED TO DISTINCT BAG***
>
> DESIRED E OUTPUT
> (1,{(1,M),(1,N)})
> (2,{(2,I)})
> (3,{(3,T)})
> (4,{(4,R),(4,I)})
>
>
>
>