You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jay Vyas <ja...@gmail.com> on 2012/10/20 05:19:32 UTC

Broad question on sorting of mapper outputs.

IS there any documentation on the internals of the shuffle and sort phase?
The elephant book seems to be the best source, but it appears to only
lightly touch upon the "magic" part (i.e. the distributed merge sorting and
mapper spilling).

Also... What is the rationale behind the sortedness of mapper outputs?  Is
the reason to optimize the streaming of mapper values to reducers?  In
simple scenarios, i.e. when there is no reducing to be done, it seems that
we may not care to have sorted mapper outputs : a random merge of all
spilled records would be sufficient.

I've noticed that the Shuffle and Sort classes in hadoop have almost no
comments and appear to simply wrap other classes.

-- 
Jay Vyas
http://jayunit100.blogspot.com

Re: Broad question on sorting of mapper outputs.

Posted by anil gupta <an...@gmail.com>.
Hi Jay,

AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0)
then the output from Mapper is not sorted.

HTH,
Anil

On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <ja...@gmail.com> wrote:

> IS there any documentation on the internals of the shuffle and sort phase?
> The elephant book seems to be the best source, but it appears to only
> lightly touch upon the "magic" part (i.e. the distributed merge sorting and
> mapper spilling).
>
> Also... What is the rationale behind the sortedness of mapper outputs?  Is
> the reason to optimize the streaming of mapper values to reducers?  In
> simple scenarios, i.e. when there is no reducing to be done, it seems that
> we may not care to have sorted mapper outputs : a random merge of all
> spilled records would be sufficient.
>
> I've noticed that the Shuffle and Sort classes in hadoop have almost no
> comments and appear to simply wrap other classes.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>



-- 
Thanks & Regards,
Anil Gupta