You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jürgen Broß <ju...@fu-berlin.de> on 2008/11/26 13:35:51 UTC

How to let Reducer know on which partition it is working

Hi all,

my Reducers need to load a huge HashMap from data present in the HDFS. 
This data has been partitioned by a previous map/reduce job. The 
complete data would not fit into main memory of a Reducer machine.  It 
would suffice to load only the correct partition of the data. The 
problem is that the "correct" partition is determined by the 
Partitioner, which feeds the current Reducers. I'm not sure how to let a 
Reducer know in its configure() method which partition it will get from 
the Partitioner, i.e. which partition to load from HDFS into the HashMap.

Maybe someone has a good idea.

Regards,
Jürgen

Re: How to let Reducer know on which partition it is working

Posted by Owen O'Malley <om...@apache.org>.

On Nov 26, 2008, at 4:35 AM, Jürgen Broß wrote:
>  I'm not sure how to let a Reducer know in its configure() method  
> which partition it will get from the Partitioner,

From:

http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Task+JVM+Reuse

look for mapred.task.partition, which is a number from 0 to # reduces  
- 1.

-- Owen