You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by Shashidhar Rao <ra...@gmail.com> on 2014/04/14 19:18:00 UTC

Map and reduce slots

Hi,

Can somebody clarify what are map and reduce slots and how Hadoop
calculates these slots. Are these slots calculated based on the number of
splits?

I am getting different answers please help

Regards
Shashidhar

Re: Map and reduce slots

Posted by João Paulo Forny <jp...@gmail.com>.
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job.reduces* parameter. If this parameter is not set, jobs have
one reduce task.
The number of map and reduce slots on each TaskTracker node is controlled
by the *mapreduce.tasktracker.map.tasks.maximum*and
*mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the
mapred-site.xml file. These parameters define the maximum number of
concurrently occupied slots on a TaskTracker node and determine the degree
of concurrency on each TaskTracker.
Finally, it is important to consider the memory usage of each map and/or
reduce task. Task heap sizes are usually controlled by the
*mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are
memory-intensive and have large JVM heaps, then reduce the number of slots.
If your Hadoop jobs have small JVM heaps, you may be able to increase the
number of slots. Keep in mind the maximum amount of memory that the task
JVMs consume if all slots are filled.


2014-04-14 14:18 GMT-03:00 Shashidhar Rao <ra...@gmail.com>:

> Hi,
>
> Can somebody clarify what are map and reduce slots and how Hadoop
> calculates these slots. Are these slots calculated based on the number of
> splits?
>
> I am getting different answers please help
>
> Regards
> Shashidhar
>

Re: Map and reduce slots

Posted by João Paulo Forny <jp...@gmail.com>.
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job.reduces* parameter. If this parameter is not set, jobs have
one reduce task.
The number of map and reduce slots on each TaskTracker node is controlled
by the *mapreduce.tasktracker.map.tasks.maximum*and
*mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the
mapred-site.xml file. These parameters define the maximum number of
concurrently occupied slots on a TaskTracker node and determine the degree
of concurrency on each TaskTracker.
Finally, it is important to consider the memory usage of each map and/or
reduce task. Task heap sizes are usually controlled by the
*mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are
memory-intensive and have large JVM heaps, then reduce the number of slots.
If your Hadoop jobs have small JVM heaps, you may be able to increase the
number of slots. Keep in mind the maximum amount of memory that the task
JVMs consume if all slots are filled.


2014-04-14 14:18 GMT-03:00 Shashidhar Rao <ra...@gmail.com>:

> Hi,
>
> Can somebody clarify what are map and reduce slots and how Hadoop
> calculates these slots. Are these slots calculated based on the number of
> splits?
>
> I am getting different answers please help
>
> Regards
> Shashidhar
>

Re: Map and reduce slots

Posted by João Paulo Forny <jp...@gmail.com>.
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job.reduces* parameter. If this parameter is not set, jobs have
one reduce task.
The number of map and reduce slots on each TaskTracker node is controlled
by the *mapreduce.tasktracker.map.tasks.maximum*and
*mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the
mapred-site.xml file. These parameters define the maximum number of
concurrently occupied slots on a TaskTracker node and determine the degree
of concurrency on each TaskTracker.
Finally, it is important to consider the memory usage of each map and/or
reduce task. Task heap sizes are usually controlled by the
*mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are
memory-intensive and have large JVM heaps, then reduce the number of slots.
If your Hadoop jobs have small JVM heaps, you may be able to increase the
number of slots. Keep in mind the maximum amount of memory that the task
JVMs consume if all slots are filled.


2014-04-14 14:18 GMT-03:00 Shashidhar Rao <ra...@gmail.com>:

> Hi,
>
> Can somebody clarify what are map and reduce slots and how Hadoop
> calculates these slots. Are these slots calculated based on the number of
> splits?
>
> I am getting different answers please help
>
> Regards
> Shashidhar
>

Re: Map and reduce slots

Posted by João Paulo Forny <jp...@gmail.com>.
You need to differentiate slots from tasks.
Tasks are spawned by TT and assigned to a free slot in the cluster.
The number of map tasks for a Hadoop job is typically controlled by the
input data size and the split size.
The number of reduce tasks for a Hadoop job is controlled by the
*mapreduce.job.reduces* parameter. If this parameter is not set, jobs have
one reduce task.
The number of map and reduce slots on each TaskTracker node is controlled
by the *mapreduce.tasktracker.map.tasks.maximum*and
*mapreduce.tasktracker.reduce.tasks.maximum* Hadoop properties in the
mapred-site.xml file. These parameters define the maximum number of
concurrently occupied slots on a TaskTracker node and determine the degree
of concurrency on each TaskTracker.
Finally, it is important to consider the memory usage of each map and/or
reduce task. Task heap sizes are usually controlled by the
*mapred.child.java.opts* Hadoop parameter. If your Hadoop jobs are
memory-intensive and have large JVM heaps, then reduce the number of slots.
If your Hadoop jobs have small JVM heaps, you may be able to increase the
number of slots. Keep in mind the maximum amount of memory that the task
JVMs consume if all slots are filled.


2014-04-14 14:18 GMT-03:00 Shashidhar Rao <ra...@gmail.com>:

> Hi,
>
> Can somebody clarify what are map and reduce slots and how Hadoop
> calculates these slots. Are these slots calculated based on the number of
> splits?
>
> I am getting different answers please help
>
> Regards
> Shashidhar
>