You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Shailesh Samudrala <sh...@gmail.com> on 2012/04/23 20:02:54 UTC

Distributing MapReduce on a computer cluster

Hello,

I am trying to design my own MapReduce Implementation and I want to know
how hadoop is able to distribute its workload across multiple computers.
Can anyone shed more light on this? thanks!

Re: Distributing MapReduce on a computer cluster

Posted by Merto Mertek <ma...@gmail.com>.

For distribution of load you can start reading some chapters from different
types of hadoop scheduler. I have not yet studied other implementation like
hadoop, however a very simplified version of distribution concept  is the
following:

a) Tasktracker ask for work (heartbeat consist of a status of the worker
node - # free slots)
b) Jobtracker pick a job from a list which is sorted based on the specified
policy (fairscheduling, fifo, lifo, other sla)
c) Tasktracker executes the map/reduce job

Like mentioned before there are a lot more details.. In b) there exists an
implementation of delay scheduling which is there to improve throughput by
taking account of input data location for a picked job. There you have a
preemption mechanism that regulate the fairness between pools,etc..

 A good start is book that Preshant mentioned...

On 23 April 2012 23:49, Prashant Kommireddi <pr...@gmail.com> wrote:

> Shailesh, there's a lot that goes into distributing work across
> tasks/nodes. It's not just distributing work but also fault-tolerance,
> data locality etc that come into play. It might be good to refer
> Hadoop apache docs or Tom White's definitive guide.
>
> Sent from my iPhone
>
> On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala <sh...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I am trying to design my own MapReduce Implementation and I want to know
> > how hadoop is able to distribute its workload across multiple computers.
> > Can anyone shed more light on this? thanks!
>

Re: Distributing MapReduce on a computer cluster

Posted by Prashant Kommireddi <pr...@gmail.com>.

Shailesh, there's a lot that goes into distributing work across
tasks/nodes. It's not just distributing work but also fault-tolerance,
data locality etc that come into play. It might be good to refer
Hadoop apache docs or Tom White's definitive guide.

Sent from my iPhone

On Apr 23, 2012, at 11:03 AM, Shailesh Samudrala <sh...@gmail.com> wrote:

> Hello,
>
> I am trying to design my own MapReduce Implementation and I want to know
> how hadoop is able to distribute its workload across multiple computers.
> Can anyone shed more light on this? thanks!