You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by tigertail <ty...@yahoo.com> on 2009/08/17 17:22:19 UTC

Percentage calculation?

Hi Hadoop/MapReduce experts,

My question might be naive, But I am really stuck here and I am looking
forward to get helps/advises from you.

I have an input file like
key1, 2
key2, 1
key1, 1
key3, 1

It is easy to write a M/R code to calculate the count for each key and
output sth like
key1, 3
key2, 1
key3, 1

But, how I can calculate the percentage of each key over all keys, with the
above input, I would expect to get the output as
key1, 0.60
key2, 0.20
key3, 0.20

One naive method is to calculate the total count (5 with the above input)
which is saved in a file. Then the file is read in before M/R starts. But it
is obviously ugly and slow. 

I also tried to set a static enum Counters { INPUT_WORDS }
In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
In reducer I do context.getCounter(Counters.INPUT_WORDS).getCounter();
But it does not work.

Is there more elegant way?
-- 
View this message in context: http://www.nabble.com/Percentage-calculation--tp25008761p25008761.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Percentage calculation?

Posted by Jingkei Ly <jl...@googlemail.com>.

If the counter method doesn't work, I've used a slightly hacky way to do
something like this in the past with the 0.19 API.

In the Mapper I kept an instance variable keeping the count, and in the
close() method I wrote out a file unique to each mapper task containing the
final value of the instance variable.

Then in the Reducers it would read in all the values and aggregate them
together to give you the total count across all mappers. It relies on the
fact that the Reducers don't start before all the Mappers have finished.

i.e. in pseudo-code

class Mapper {
    int inputWords = 0;

    map(key, value){
         inputWords += value;
    }

    close() {
         // write out inputWords to a file unique to this mapper task
    }
}

class Reducer {
    int totalInputWords = 0;

    reduce() {
        if (firstTime)  {
            for all inputWordFiles, f {
                 int mapperInputWord = f.readInt();
                 totalInputWords += mapperInputWord;
            }
        }
        // use totalInputWords to calculate percentage
    }

}

Hope that makes sense.

2009/8/17 tigertail <ty...@yahoo.com>

>
> Can sb help please? I would expect there must be some easy way to do that.
>
> Some corrections,
> In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> But it does not work. it always returns 0.
>
>
> tigertail wrote:
> >
> > Hi Hadoop/MapReduce experts,
> >
> > My question might be naive, But I am really stuck here and I am looking
> > forward to get helps/advises from you.
> >
> > I have an input file like
> > key1, 2
> > key2, 1
> > key1, 1
> > key3, 1
> >
> > It is easy to write a M/R code to calculate the count for each key and
> > output sth like
> > key1, 3
> > key2, 1
> > key3, 1
> >
> > But, how I can calculate the percentage of each key over all keys, with
> > the above input, I would expect to get the output as
> > key1, 0.60
> > key2, 0.20
> > key3, 0.20
> >
> > One naive method is to calculate the total count (5 with the above input)
> > which is saved in a file. Then the file is read in before M/R starts. But
> > it is obviously ugly and slow.
> >
> > I also tried to set a static enum Counters { INPUT_WORDS }
> > In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
> > In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> > But it does not work. it always returns 0.
> >
> > Is there more elegant way?
> >
>
> --
> View this message in context:
> http://www.nabble.com/Percentage-calculation--tp25008761p25013023.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>

Re: Percentage calculation?

Posted by tigertail <ty...@yahoo.com>.

Can sb help please? I would expect there must be some easy way to do that.

Some corrections,
In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
But it does not work. it always returns 0.


tigertail wrote:
> 
> Hi Hadoop/MapReduce experts,
> 
> My question might be naive, But I am really stuck here and I am looking
> forward to get helps/advises from you.
> 
> I have an input file like
> key1, 2
> key2, 1
> key1, 1
> key3, 1
> 
> It is easy to write a M/R code to calculate the count for each key and
> output sth like
> key1, 3
> key2, 1
> key3, 1
> 
> But, how I can calculate the percentage of each key over all keys, with
> the above input, I would expect to get the output as
> key1, 0.60
> key2, 0.20
> key3, 0.20
> 
> One naive method is to calculate the total count (5 with the above input)
> which is saved in a file. Then the file is read in before M/R starts. But
> it is obviously ugly and slow. 
> 
> I also tried to set a static enum Counters { INPUT_WORDS }
> In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
> In reducer I do context.getCounter(Counters.INPUT_WORDS).getValue();
> But it does not work. it always returns 0.
> 
> Is there more elegant way?
> 

-- 
View this message in context: http://www.nabble.com/Percentage-calculation--tp25008761p25013023.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Re: Percentage calculation?

Posted by Jingkei Ly <jl...@googlemail.com>.

does using *
context.getCounter(Counters.INPUT_WORDS).getCounter().getValue();* make a
difference?

2009/8/17 tigertail <ty...@yahoo.com>

>
> Hi Hadoop/MapReduce experts,
>
> My question might be naive, But I am really stuck here and I am looking
> forward to get helps/advises from you.
>
> I have an input file like
> key1, 2
> key2, 1
> key1, 1
> key3, 1
>
> It is easy to write a M/R code to calculate the count for each key and
> output sth like
> key1, 3
> key2, 1
> key3, 1
>
> But, how I can calculate the percentage of each key over all keys, with the
> above input, I would expect to get the output as
> key1, 0.60
> key2, 0.20
> key3, 0.20
>
> One naive method is to calculate the total count (5 with the above input)
> which is saved in a file. Then the file is read in before M/R starts. But
> it
> is obviously ugly and slow.
>
> I also tried to set a static enum Counters { INPUT_WORDS }
> In mapper I do context.getCounter(Counters.INPUT_WORDS).increment(1);
> In reducer I do context.getCounter(Counters.INPUT_WORDS).getCounter();
> But it does not work.
>
> Is there more elegant way?
> --
> View this message in context:
> http://www.nabble.com/Percentage-calculation--tp25008761p25008761.html
> Sent from the Hadoop core-user mailing list archive at Nabble.com.
>
>