You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-dev@hadoop.apache.org by Knowledge gatherer <kn...@gmail.com> on 2014/05/24 16:10:34 UTC

Sorting in Mapper to Reducer

Hi,

  I want to know how the sort happens in ascending order, whenever the keys
from mappers are emitted to reducer.

What is the algorithm being used ?

Any links or guidelines will be of real help.

Thanks in Advance.

Re: Sorting in Mapper to Reducer

Posted by Knowledge gatherer <kn...@gmail.com>.

Thanks a lot. It was really helpful.


On Sat, May 24, 2014 at 8:30 PM, Pedro Dusso <pm...@gmail.com> wrote:

> I believe some good web resources are:
>
>    - http://www.slideshare.net/cloudera/mr-perf
>    -
>
> http://gbif.blogspot.de/2011/01/setting-up-hadoop-cluster-part-1-manual.html(look
> at "The Map Side" section
>    - This chapter from the T. White's Hadoop book:
>
> https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
>    - Explanation abou the Map Task:
>    http://codrspace.com/b441berith/hadoop-maptask-inside/
>
>
> Basically, the keys emitted from the map function are accumulated in a
> in-memory buffer (MapOutputBuffer class). When the buffer gets full, the
> keys are sorted first by partition and, within the partitions, by key and
> then write in a temporary file called spill. The in-memory sorting
> algorithm used is quicksort. When the map task has finished processing its
> input split, possibly there will be many spills, which must be merged into
> one single file in order to be available for the reduce tasks.
>
> Best,
>
> Dusso
>
>
> 2014-05-24 16:10 GMT+02:00 Knowledge gatherer <
> knowledge.gatherer.007@gmail.com>:
>
> > Hi,
> >
> >   I want to know how the sort happens in ascending order, whenever the
> keys
> > from mappers are emitted to reducer.
> >
> > What is the algorithm being used ?
> >
> > Any links or guidelines will be of real help.
> >
> > Thanks in Advance.
> >
>

Re: Sorting in Mapper to Reducer

Posted by Pedro Dusso <pm...@gmail.com>.

I believe some good web resources are:

   - http://www.slideshare.net/cloudera/mr-perf
   -
   http://gbif.blogspot.de/2011/01/setting-up-hadoop-cluster-part-1-manual.html(look
at "The Map Side" section
   - This chapter from the T. White's Hadoop book:
   https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-6/shuffle-and-sort
   - Explanation abou the Map Task:
   http://codrspace.com/b441berith/hadoop-maptask-inside/


Basically, the keys emitted from the map function are accumulated in a
in-memory buffer (MapOutputBuffer class). When the buffer gets full, the
keys are sorted first by partition and, within the partitions, by key and
then write in a temporary file called spill. The in-memory sorting
algorithm used is quicksort. When the map task has finished processing its
input split, possibly there will be many spills, which must be merged into
one single file in order to be available for the reduce tasks.

Best,

Dusso


2014-05-24 16:10 GMT+02:00 Knowledge gatherer <
knowledge.gatherer.007@gmail.com>:

> Hi,
>
>   I want to know how the sort happens in ascending order, whenever the keys
> from mappers are emitted to reducer.
>
> What is the algorithm being used ?
>
> Any links or guidelines will be of real help.
>
> Thanks in Advance.
>