You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Saptarshi Guha <sa...@gmail.com> on 2010/06/19 18:16:01 UTC

Is the sort(in sort and shuffle) always required

Hello,
My question: is the sort (in the sort and shuffle) absolutely required?
If I wanted mapreduce to partition (using the map) and then aggregate(using
reduce) without a need for the keys to be sorted
is it possible to turn of the sorting? Or is the fact that keys come to the
reducer in sorted order just a side effect of sorting and that
the sorting is vital for the efficient operation of MapReduce?


Thanks
Saptarshi

Re: Is the sort(in sort and shuffle) always required

Posted by Owen O'Malley <om...@apache.org>.
On Sat, Jun 19, 2010 at 9:16 AM, Saptarshi Guha
<sa...@gmail.com> wrote:
> My question: is the sort (in the sort and shuffle) absolutely required?
> If I wanted mapreduce to partition (using the map) and then aggregate(using
> reduce) without a need for the keys to be sorted
> is it possible to turn of the sorting? Or is the fact that keys come to the
> reducer in sorted order just a side effect of sorting and that
> the sorting is vital for the efficient operation of MapReduce?

If you have 0 reduces, you don't get any sorting or aggregation. It
isn't possible to turn off the sorting and leaving the aggregation. In
practice, the sort doesn't cost as much as the data transfer between
the map and reduce.

-- Owen