You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Andy Liu <an...@gmail.com> on 2009/04/14 18:19:31 UTC

Total number of records processed in mapper

Is there a way for all the reducers to have access to the total number of
records that were processed in the Map phase?

For example, I'm trying to perform a simple document frequency calculation.
During the map phase, I emit <word, 1> pairs for every unique word in every
document.  During the reduce phase, I sum the values for each word group.
Then I want to divide that value by the total number of documents.

I suppose I can create a whole separate m/r job whose sole purpose is to
count all the records, then pass that number on.  Is there a more
straighforward way of doing this?

Andy

Re: Total number of records processed in mapper

Posted by Jim Twensky <ji...@gmail.com>.
Hi Andy,

Take a look at this piece of code:

Counters counters = job.getCounters();
counters.findCounter("org.apache.hadoop.mapred.Task$Counter",
"REDUCE_INPUT_RECORDS").getCounter()

This is for reduce input records but I believe there is also a counter for
reduce output records. You should dig into the source code to find out what
it is because unfortunately, the default counters associated with the
map/reduce jobs are not public yet.

-Jim


On Tue, Apr 14, 2009 at 11:19 AM, Andy Liu <an...@gmail.com> wrote:

> Is there a way for all the reducers to have access to the total number of
> records that were processed in the Map phase?
>
> For example, I'm trying to perform a simple document frequency calculation.
> During the map phase, I emit <word, 1> pairs for every unique word in every
> document.  During the reduce phase, I sum the values for each word group.
> Then I want to divide that value by the total number of documents.
>
> I suppose I can create a whole separate m/r job whose sole purpose is to
> count all the records, then pass that number on.  Is there a more
> straighforward way of doing this?
>
> Andy
>