You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Samaneh Shokuhi <sa...@gmail.com> on 2012/01/21 18:21:54 UTC

regarding to sort and reducer

Hi All,
I am very new to hadoop and going to do some research on it regarding to my
master thesis. First of all What i want to do is to know the fuctionality
of sort  and shuffle and to run an applition while hadoop included  and not
included sort part.
I need to know which class in hadoop is taking care of sort ?

Another thing i need to know is the functianlity of reducer and to find out
the possibility of sending message from one reducer to another one and
doing kind of work stealing between reducers.

Since i am very new to hadoop and it has alot of modules ,i need to know
which project should i look at it.
Also i ll appriciate you to let me know if you have any comment on this
idea.

Samaneh

Re: regarding to sort and reducer

Posted by Harsh J <ha...@cloudera.com>.
Samaneh,

Sorry for the late response. Inline, some of what I can offer.

On Sat, Jan 21, 2012 at 10:51 PM, Samaneh Shokuhi
<sa...@gmail.com> wrote:
> Hi All,

Welcome!

> I am very new to hadoop and going to do some research on it regarding to my
> master thesis. First of all What i want to do is to know the fuctionality
> of sort  and shuffle and to run an applition while hadoop included  and not
> included sort part.
> I need to know which class in hadoop is taking care of sort ?

Are you looking for the sort mechanism or the algorithm?

This is an excellent presentation on the MR sort/shuffle/merge layers
that I recommend reading:
http://www.slideshare.net/hadoopusergroup/ordered-record-collection

> Another thing i need to know is the functianlity of reducer and to find out
> the possibility of sending message from one reducer to another one and
> doing kind of work stealing between reducers.

You probably want to read ReduceTask class, but this functionality is
not present today. Perhaps easier to do with the new MR2 framework,
detailed in http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/

> Since i am very new to hadoop and it has alot of modules ,i need to know
> which project should i look at it.
> Also i ll appriciate you to let me know if you have any comment on this
> idea.

You need to look at the hadoop-mapreduce-project in trunk for all
things MR today. It also uses some generic components from the
hadoop-common project. See
http://wiki.apache.org/hadoop/HowToContribute for more details.

Please feel free to mail the lists with any specific questions you
have as you go ahead!

-- 
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about