You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Han Dong <ha...@gmail.com> on 2010/10/26 01:04:42 UTC

Question about running hadoop on multiple nodes/cores

Hi,

In running hadoop, is it possible to specify the number of computing nodes
to use? Or does hadoop automatically configures to run on different nodes?

For example, if I specify 12 map tasks to run, and there are a cluster of 12
computing nodes, will hadoop automatically run one map task per node, or
would it run 2 maps per node for 6 nodes, if it detects the node has a 2
core processor?

Thank You,
Han Dong
handong@buffalo.edu

Re: Question about running hadoop on multiple nodes/cores

Posted by Yin Lou <yi...@gmail.com>.
You can use Hadoop on Demand (HOD).

Yin

On Mon, Oct 25, 2010 at 7:04 PM, Han Dong <ha...@gmail.com> wrote:

> Hi,
>
> In running hadoop, is it possible to specify the number of computing nodes
> to use? Or does hadoop automatically configures to run on different nodes?
>
> For example, if I specify 12 map tasks to run, and there are a cluster of
> 12 computing nodes, will hadoop automatically run one map task per node, or
> would it run 2 maps per node for 6 nodes, if it detects the node has a 2
> core processor?
>
> Thank You,
> Han Dong
> handong@buffalo.edu
>

Re: Question about running hadoop on multiple nodes/cores

Posted by Harsh J <qw...@gmail.com>.
On Tue, Oct 26, 2010 at 4:34 AM, Han Dong <ha...@gmail.com> wrote:
> Hi,
> In running hadoop, is it possible to specify the number of computing nodes
> to use? Or does hadoop automatically configures to run on different nodes?
> For example, if I specify 12 map tasks to run, and there are a cluster of 12
> computing nodes, will hadoop automatically run one map task per node, or
> would it run 2 maps per node for 6 nodes, if it detects the node has a 2
> core processor?

Hadoop MapReduce does not do it this way. It is data-driven instead.

If you are utilizing HDFS, then the tasks would be run on the node
that has the data (a block split or a file among many) local to
itself. Else a rack-local node is chosen based on its availability
(no. of slots free, etc.) and the task is assigned to it.

For your question: Hadoop will *try* to utilize ALL the nodes
(TaskTrackers) available to it. It does not let you specify the
tasktracker node to use directly. So yes, hadoop will "automatically
configure" the tasks to run on different nodes.

About that example: It depends on where the 12 map's input data blocks
are residing in the cluster. So say you have all 12 blocks on only a
single machine, and that the machine has a capacity of 12 maps, only
then all 12 mappers could be executed there itself.

[Data-locality -- Hadoop is pretty good at it, its methodology being
bringing computation to the data, not the other way round].

-- 
Harsh J
www.harshj.com