You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Mori Bellamy <mb...@apple.com> on 2008/07/22 02:08:45 UTC

more than one reducer?

hey all,
i was wondering if its possible to split up the reduce task amongst  
more than one machine. i figured it might be possible for the map  
output to be copied to multiple machines; then each reducer could sort  
its keys and then combine them into one big sorted output (a la  
mergesort). does anybody know if there is an in-place mechanism for  
this?

Re: more than one reducer?

Posted by Taeho Kang <tk...@gmail.com>.

I don't know if there is any in-place mechanism for what you're looking for.

However, you could write a partitioner that distributes data in a way that
lower keys go to lower numbered reduce, and higher keys go to higher
numbered reduce. (e.g. Key starting with 'A~D' goes to part-0000, 'E~H' goes
to part-0001, and so on.) If you knew how well keys are distributed
beforehand, then you could distribute data quite equally to each reducer as
well.

When you are done, simply download the result files and just merge them
together and you have sorted output.

On Tue, Jul 22, 2008 at 9:08 AM, Mori Bellamy <mb...@apple.com> wrote:

> hey all,
> i was wondering if its possible to split up the reduce task amongst more
> than one machine. i figured it might be possible for the map output to be
> copied to multiple machines; then each reducer could sort its keys and then
> combine them into one big sorted output (a la mergesort). does anybody know
> if there is an in-place mechanism for this?
>