You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jürgen Broß <ju...@fu-berlin.de> on 2008/11/26 13:35:51 UTC
How to let Reducer know on which partition it is working
Hi all,
my Reducers need to load a huge HashMap from data present in the HDFS.
This data has been partitioned by a previous map/reduce job. The
complete data would not fit into main memory of a Reducer machine. It
would suffice to load only the correct partition of the data. The
problem is that the "correct" partition is determined by the
Partitioner, which feeds the current Reducers. I'm not sure how to let a
Reducer know in its configure() method which partition it will get from
the Partitioner, i.e. which partition to load from HDFS into the HashMap.
Maybe someone has a good idea.
Regards,
Jürgen
Re: How to let Reducer know on which partition it is working
Posted by Owen O'Malley <om...@apache.org>.
On Nov 26, 2008, at 4:35 AM, Jürgen Broß wrote:
> I'm not sure how to let a Reducer know in its configure() method
> which partition it will get from the Partitioner,
From:
http://hadoop.apache.org/core/docs/r0.19.0/mapred_tutorial.html#Task+JVM+Reuse
look for mapred.task.partition, which is a number from 0 to # reduces
- 1.
-- Owen