You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by jamal sasha <ja...@gmail.com> on 2012/09/28 18:52:08 UTC

Finding records greater than a value

Hi
 I have two files..
File 1 contains following data.
Id, amount
1234, 22.7
1158,88
1234,  280

File 2 contains following data
Id, min, max
1234, 8, 150

Now I want to calculate the mean (avg) but without considering the values
less or greater than min and max respectively

So basically in mean calculation here
I don't want 1234, 280 as 280 > 150

Any suggestions

Re: Finding records greater than a value

Posted by Cheolsoo Park <ch...@cloudera.com>.
Hi,

Please try this:

value = load '1.txt' using PigStorage(',') as (id:int,amount:float);
minAndMax = load '2.txt' using PigStorage(',') as
(id:int,min:float,max:float);
joined = join value by id, minAndMax by id;
filtered = filter joined by (value::amount > minAndMax::min and
value::amount < minAndMax::max);
grouped = group filtered by value::id;
result = foreach grouped generate group, AVG(filtered.value::amount);
dump result;

Given that "1.txt" and "2.txt" are as follows:

cheolsoo@localhost:~/workspace/pig-2778-matches $cat 2.txt
1234,8,150
1158,0,200
cheolsoo@localhost:~/workspace/pig-2778-matches $cat 1.txt
1234,22.7
1158,88
1234,280
1158,100

The result is:

(1158,94.0)
(1234,22.700000762939453)

Thanks,
Cheolsoo

On Fri, Sep 28, 2012 at 9:52 AM, jamal sasha <ja...@gmail.com> wrote:

> Hi
>  I have two files..
> File 1 contains following data.
> Id, amount
> 1234, 22.7
> 1158,88
> 1234,  280
>
> File 2 contains following data
> Id, min, max
> 1234, 8, 150
>
> Now I want to calculate the mean (avg) but without considering the values
> less or greater than min and max respectively
>
> So basically in mean calculation here
> I don't want 1234, 280 as 280 > 150
>
> Any suggestions
>