You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by indrani gorti <in...@gmail.com> on 2012/03/16 19:05:37 UTC

Sorting algorithm

Hi

Which is the sorting algorith used in map-reduce to sort the data set in
the shuffling stage i.e after the mapped for each split up of the entire
dataset. I understand that it is merge sort for sure. But is there
modification in the algorithm in specific?  By this I mean more what are
the other details on the algorithm.

Thanks in advance.

Indrani

Re: Sorting algorithm

Posted by indrani gorti <in...@gmail.com>.

Thanks a lot Owen! :-)

On Fri, Mar 16, 2012 at 2:49 PM, Owen O'Malley <om...@apache.org> wrote:

> On Fri, Mar 16, 2012 at 6:05 PM, indrani gorti <indrani.gorti@gmail.com
> >wrote:
>
> > Hi
> >
> > Which is the sorting algorith used in map-reduce to sort the data set in
> > the shuffling stage i.e after the mapped for each split up of the entire
> > dataset.
>
>
> Take a look at Chris Douglas' presentation on the sort.
>
> Slides:
> http://www.slideshare.net/hadoopusergroup/ordered-record-collection
> Video:
>
> http://developer.yahoo.com/blogs/hadoop/posts/2010/01/hadoop_bay_area_january_2010_u/
>
>
> The original in memory sort is a quicksort. After that it is a merge sort.
>
> -- Owen
>



-- 
Indrani Gorti

Re: Sorting algorithm

Posted by Owen O'Malley <om...@apache.org>.

On Fri, Mar 16, 2012 at 6:05 PM, indrani gorti <in...@gmail.com>wrote:

> Hi
>
> Which is the sorting algorith used in map-reduce to sort the data set in
> the shuffling stage i.e after the mapped for each split up of the entire
> dataset.

Take a look at Chris Douglas' presentation on the sort.

Slides: http://www.slideshare.net/hadoopusergroup/ordered-record-collection
Video:
http://developer.yahoo.com/blogs/hadoop/posts/2010/01/hadoop_bay_area_january_2010_u/

The original in memory sort is a quicksort. After that it is a merge sort.

-- Owen