You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by "Malviya, Saurabh" <sm...@comscore.com> on 2013/11/21 04:15:32 UTC

Complare values with in bag

Hi,

I am having similar kind of problem, Want to compare value with in bag.

Example

(1{(2,4)(1,5)) --> Now want to compare the overlap of tuple, if  ( 2 between 1 and 5) or (4 between 1 and 5) want to return true else false
(2, {(1,10)(5,8))


A = Group collection By  (x);
B = Foreach  A {
        Compareflag  = How to compare values with in bag for each group
                Generate  x,Compareflag;
};

Saurabh

comScore Media Metrix(r) Multi-Platform: Audience Analytics for the Brave New Digital World

www.comscore.com/multiplatform
-----Original Message-----
From: Jacob Perkins [mailto:jacob.a.perkins@gmail.com]
Sent: Wednesday, November 20, 2013 7:55 AM
To: user@pig.apache.org
Subject: Re: Simple word count in pig..

Jamal,

You're going to want to use a FLATTEN and another group by. Consider:

flattened   = foreach processed generate id, flatten(tokens) as token;
frequency = foreach (group flattened by (id, token)) generate
                        flatten(group)         as (id, token),
                        COUNT(flattened) as freq;

Of course, this will spawn another map-reduce job. However, since COUNT is algebraic, pig will make use of combiners drastically reducing the amount of data sent to the reducers.

--jacob
@thedatachef

On Nov 19, 2013, at 5:45 PM, jamal sasha <ja...@gmail.com> wrote:

> Hi,
>
> I have data already processed in following form:
>
>
> ( id ,{ bag of words})
> So for example:
>
> (foobar, {(foo), (foo),(foobar),(bar)})
> (foo,{(bar),(bar)})
>
> and so on..
> describe processed gives me:
> processed: {id: chararray,tokens: {tuple_of_tokens: (token:
> chararray)}}
>
>
> Now what I want is.. also count the number of times a word appears in
> this data and output it as foobar, foo, 2
> foobar,foobar,1
> foobar,bar,1
> foo,bar,2
>
> and so on...
>
> How do I do this in pig?