You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Zhang, Guibin" <gz...@freewheel.tv> on 2008/02/28 04:29:53 UTC

How to configure the hadoop to distribute less tasks to weak nodes?

Hi,all

         I find that when the sub-nodes' hardware configurations are
different, some nodes are strong(more cpus and more memory), others are
weak (leas cpus and less memory), when I run the job, the task are
almost evenly distributed to all the sub-nodes. This makes the weak
nodes pretty slow and a lot of tasks on the weak nodes are killed. This
may lead the whole job processing becoming slow, I am sure, because a
lot of tasks(more than 10 tasks) are processed twice.

         Question: How can I configure the hadoop to distribute less
tasks to weak nodes and distribute more tasks to strong nodes?

 

I configure the strong nodes with 

"mapred.tasktracker.map.tasks.maximum=75", 

"mapred.map.tasks=60", 

"mapred.tasktracker.reduce.tasks.maximum=18", 

"mapred.reduce.tasks=15" 

 

and the weak nodes with 

"mapred.tasktracker.map.tasks.maximum=60", 

"mapred.map.tasks=45", 

"mapred.tasktracker.reduce.tasks.maximum=15", 

"mapred.reduce.tasks=12"

 

I have 4 nodes totally. One for name node and job tracker, the others
are for sub-nodes.

 

Thanks.

 

Guibin zhang


Re: How to configure the hadoop to distribute less tasks to weak nodes?

Posted by Owen O'Malley <oo...@yahoo-inc.com>.
On Feb 27, 2008, at 7:29 PM, Zhang, Guibin wrote:

>          Question: How can I configure the hadoop to distribute less
> tasks to weak nodes and distribute more tasks to strong nodes?

mapred.tasktracker.map.tasks.maximum is the number of tasks to run on  
that task track simultaneously. 75 is almost certainly too high.  
mapred.map.tasks is only relevant on the submitting node, because  
that is where the planning takes place. More reasonable values  
(depending on your hardware) are:

strong:
mapred.tasktracker.map.tasks.maximum = 8
mapred.tasktracker.reduce.tasks.maximum = 4

weak:
mapred.tasktracker.map.tasks.maximum = 4
mapred.tasktracker.reduce.tasks.maximum = 2

client:
mapred.map.tasks = nodes * avgerage 
(mapred.tasktracker.map.tasks.maximum)
mapred.reduce.tasks = 95% * nodes * average 
(mapred.tasktracker.reduce.tasks.maximum)

-- Owen

Re: How to configure the hadoop to distribute less tasks to weak nodes?

Posted by Amar Kamat <am...@yahoo-inc.com>.
This can be easily done through HoD since it requires separate 
configuration files for each tasktracker i.e node. As of now I dont think this can 
be done in HADOOP. Anyways never seen such high values for max tasks. :)
Amar.
On Wed, 27 Feb 2008, Zhang, 
Guibin wrote:

> Hi,all
>
>         I find that when the sub-nodes' hardware configurations are
> different, some nodes are strong(more cpus and more memory), others are
> weak (leas cpus and less memory), when I run the job, the task are
> almost evenly distributed to all the sub-nodes. This makes the weak
> nodes pretty slow and a lot of tasks on the weak nodes are killed. This
> may lead the whole job processing becoming slow, I am sure, because a
> lot of tasks(more than 10 tasks) are processed twice.
>
>         Question: How can I configure the hadoop to distribute less
> tasks to weak nodes and distribute more tasks to strong nodes?
>
>
>
> I configure the strong nodes with
>
> "mapred.tasktracker.map.tasks.maximum=75",
>
> "mapred.map.tasks=60",
>
> "mapred.tasktracker.reduce.tasks.maximum=18",
>
> "mapred.reduce.tasks=15"
>
>
>
> and the weak nodes with
>
> "mapred.tasktracker.map.tasks.maximum=60",
>
> "mapred.map.tasks=45",
>
> "mapred.tasktracker.reduce.tasks.maximum=15",
>
> "mapred.reduce.tasks=12"
>
>
>
> I have 4 nodes totally. One for name node and job tracker, the others
> are for sub-nodes.
>
>
>
> Thanks.
>
>
>
> Guibin zhang
>
>