You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Stuart White <st...@gmail.com> on 2009/06/27 17:25:15 UTC

Confused about partitioning and reducers

If I call HashPartitioner.getPartition(), passing a key of 4 and a
numPartitions of 5, it returns a partition of 4.  (Which is what I would
expect.)

However, if I have a mapred job, and in my mapper I emit a record with key
4, I'm configured to use the HashPartitioner, I have 5 Reducers configured,
and I'm using the IdentityReducer, the record with key 4 gets handled by
Reducer #0 (because it gets written out to part-00000).

I would have expected a record with key 4 to be handled by reducer #4 (and
therefore written to part-00004) because the HashPartitioner returns 4 for a
key of 4 and a numPartitions of 5.

Obviously I'm missing something here.  What is the logic for deciding which
partition of records is handled by which reducer instance?

It can't be random, otherwise mapside join wouldn't work.

Thanks.

Re: Confused about partitioning and reducers

Posted by Stuart White <st...@gmail.com>.
Please disregard this question.  I think I'm mistaken.

On Sat, Jun 27, 2009 at 10:25 AM, Stuart White <st...@gmail.com>wrote:

> If I call HashPartitioner.getPartition(), passing a key of 4 and a
> numPartitions of 5, it returns a partition of 4.  (Which is what I would
> expect.)
>
> However, if I have a mapred job, and in my mapper I emit a record with key
> 4, I'm configured to use the HashPartitioner, I have 5 Reducers configured,
> and I'm using the IdentityReducer, the record with key 4 gets handled by
> Reducer #0 (because it gets written out to part-00000).
>
> I would have expected a record with key 4 to be handled by reducer #4 (and
> therefore written to part-00004) because the HashPartitioner returns 4 for a
> key of 4 and a numPartitions of 5.
>
> Obviously I'm missing something here.  What is the logic for deciding which
> partition of records is handled by which reducer instance?
>
> It can't be random, otherwise mapside join wouldn't work.
>
> Thanks.
>