You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-user@hadoop.apache.org by Fang Xin <nu...@gmail.com> on 2012/04/03 10:15:59 UTC

What determines the map task / reduce task capacity? average task per node?

Hi all,

of course it's sensible that number of nodes in the cluster will
influence map / reduce task capacity, but what determines average task
per node?
Can the number be manually set? any hardware constraint on setting the number?

Thank you!
Xin

Re: What determines the map task / reduce task capacity? average task per node?

Posted by Bejoy Ks <be...@gmail.com>.
hi Xin

            To add on  the factors that you need to primarily consider
in deciding the slots is

- Memory
        If your task needs 1Gb each and you have an available memory of
12Gb you can host 12 slots. Divide the same between mapper and reducer
slots proportionally based on the jobs in your cluster.
        By available memory, I mean the memory that is left out
after allocating to OS, DN and TT daemons, Region Server, Other services
etc.

- CPU
      It is good to have one core available per slot. If processor is hyper
threaded you can calculate the number of cores as 1.5*physical cores (it is
an approximate number)

Hope it helps!...

Regards
Bejoy KS

On Tue, Apr 3, 2012 at 1:58 PM, Bejoy Ks <be...@gmail.com> wrote:

> Hi Xin
>       Yes, the number of worker nodes do count on the map and reduce
> capacity of the cluster. The map and reduce task capacity/slots is
> dependen't on each node and of course the requirements of your applications
> that use the cluster. Based on the available memory, number of cores etc
> you need to configure the slots so that there won't be any resource crunch
> while running your tasks. You can set the slots on each node it the
> corresponding mapred-site.xml using.
>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>12</value>
> </property>
> <property>
> <name>mapred.tasktracker.reducer.tasks.maximum</name>
> <value>4</value>
> </property>
>
> Regards
> Bejoy KS
>
> On Tue, Apr 3, 2012 at 1:45 PM, Fang Xin <nu...@gmail.com> wrote:
>
>> Hi all,
>>
>> of course it's sensible that number of nodes in the cluster will
>> influence map / reduce task capacity, but what determines average task
>> per node?
>> Can the number be manually set? any hardware constraint on setting the
>> number?
>>
>> Thank you!
>> Xin
>>
>
>

Re: What determines the map task / reduce task capacity? average task per node?

Posted by Bejoy Ks <be...@gmail.com>.
Hi Xin
      Yes, the number of worker nodes do count on the map and reduce
capacity of the cluster. The map and reduce task capacity/slots is
dependen't on each node and of course the requirements of your applications
that use the cluster. Based on the available memory, number of cores etc
you need to configure the slots so that there won't be any resource crunch
while running your tasks. You can set the slots on each node it the
corresponding mapred-site.xml using.

<name>mapred.tasktracker.map.tasks.maximum</name>
<value>12</value>
</property>
<property>
<name>mapred.tasktracker.reducer.tasks.maximum</name>
<value>4</value>
</property>

Regards
Bejoy KS

On Tue, Apr 3, 2012 at 1:45 PM, Fang Xin <nu...@gmail.com> wrote:

> Hi all,
>
> of course it's sensible that number of nodes in the cluster will
> influence map / reduce task capacity, but what determines average task
> per node?
> Can the number be manually set? any hardware constraint on setting the
> number?
>
> Thank you!
> Xin
>