You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Zhengguo 'Mike' SUN <zh...@yahoo.com> on 2009/06/12 20:25:20 UTC

The behavior of HashPartitioner

Hi,

The intermediate key generated by my Mappers is IntWritable. I tested with different number of Reducers. When the number of Reducers is the same as the number of different keys of intermediate output. It partitions perfectly. Each Reducer receives one input group. When these two numbers are different, the partitioning function becomes difficult to understand. For example, when the number of keys is less than the number of Reducers, I am expecting that each Reducer at most receive one input group. But it turns out that many Reducers receive more than one input group. On the other hand, when the number of keys is larger than the number of Reducers, I am expecting that each Reducer at least receive one input group. But it turns out that some Reducers receive nothing to process. The expectation I had is from the implementation of HashPartitioner class, which just uses modulo operator with the number of Reducers to generate partitions.

Anyone has any insights into this?



      

Re: The behavior of HashPartitioner

Posted by jason hadoop <ja...@gmail.com>.
You can always write something simple to hand call the HashPartitioner.
Jython works for quick tests.
But the code in  hash partitioner is essentially ((int) key.hashcode()) %
num reduces.
Since nothing else is in play, I suspect there is an incorrect assumption
somewhere.


On Fri, Jun 12, 2009 at 11:25 AM, Zhengguo 'Mike' SUN <zhengguosun@yahoo.com
> wrote:

> Hi,
>
> The intermediate key generated by my Mappers is IntWritable. I tested with
> different number of Reducers. When the number of Reducers is the same as the
> number of different keys of intermediate output. It partitions perfectly.
> Each Reducer receives one input group. When these two numbers are different,
> the partitioning function becomes difficult to understand. For example, when
> the number of keys is less than the number of Reducers, I am expecting that
> each Reducer at most receive one input group. But it turns out that many
> Reducers receive more than one input group. On the other hand, when the
> number of keys is larger than the number of Reducers, I am expecting that
> each Reducer at least receive one input group. But it turns out that some
> Reducers receive nothing to process. The expectation I had is from the
> implementation of HashPartitioner class, which just uses modulo operator
> with the number of Reducers to generate partitions.
>
> Anyone has any insights into this?
>
>
>
>




-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.apress.com/book/view/9781430219422
www.prohadoopbook.com a community for Hadoop Professionals