You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Grandl Robert <rg...@yahoo.com> on 2010/11/22 17:32:12 UTC

Hadoop - how exactly is a slot defined

Hi all,

I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power. 

However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?

Thanks a lot,
Robert

Re: Hadoop - how exactly is a slot defined

Posted by Xavier Stevens <xs...@mozilla.com>.

Hi Robert,

Task slots are a way of constraining the amount of resources that will
be used on a given physical machine.  I believe the defaults set map
task slots to 2 and reduce task slots to 1.  Many of us use numbers
higher than this depending on our applications.  When picking a number
you want to keep in mind how many CPU cores and memory you have on a
physical machine.  If you set the child java opts to use 512MB of
memory; that number is per task.  So lets say you have 4 map task slots
and 2 reduce task slots.  Conceivably at some point all 6 slots would be
running tasks at some point which means you need at least 6 x 512MB of
memory plus a little extra for other processes (TaskTracker, DataNode,
etc.) and the OS.

A good starting point is to work backwards from how much physical memory
you have a decide how much you would like for tasks to have available. 
Or if you aren't worried about being memory heavy then you pick the
number of task slots to be close to the number of cores.

Hope this helps!

Cheers,

-Xavier

On 11/22/10 8:32 AM, Grandl Robert wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power. 
>
> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?
>
> Thanks a lot,
> Robert
>
>
>

Re: Hadoop - how exactly is a slot defined

Posted by Harsh J <qw...@gmail.com>.

Hi,

Answers inline.

On Mon, Nov 22, 2010 at 11:08 PM, Grandl Robert <rg...@yahoo.com> wrote:
> Thanks all for your comments.
>
> However, I still have some doubts.
>
> Basically I can control the number of map/reduce slots with
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
> but, it is possible to set different number of map/reduce slots for different slaves ?

Yes, this setting is 'tasktracker' specific, as the property name
goes. Each TaskTracker can have a different config to load from.

>
> For example If I am running in a heterogeneous environment, where each slave have different configuration, it is possible to set different number of slots based on the specific machine configurations ?

Yes, give each machine a unique value via its local copy of mapred-site.xml

> For the moment I observed that I can modify only on the master this parameters, therefore all the nodes will run with same number of map/reduce slots careless of whatever resources(CPU,MEMORY) offer each other.

Not really, your slave machine's config file (conf/mapred-site.xml)
needs to reflect the proper settings you need it to use for its
TaskTracker (DataNodes have specific configuration as well).

-- 
Harsh J
www.harshj.com

Re: Hadoop - how exactly is a slot defined

Posted by Harsh J <qw...@gmail.com>.

Hi,

On Wed, Nov 24, 2010 at 10:23 PM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi,
> I am sorry bothering again about this subject, but still I am not very convinced what Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate corresponding numbers of map/reduce slots based on your configurations.
> BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical resources itself, or are just some numbers and you can go over only with this numbers.

The slot amount is the user's homework for now.

> I looked on the code, but I am not able to figure out if Hadoop really did some checking between number of slots and physical resources, or just is limited by the 2 numbers(for maximum number of map slots and reduce slots) and play with this numbers only. That means, the user should give more interpretation of what a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when configure the number of map/reduce slots on his machines.

Yes, Hadoop does not dynamically detect any such thing yet. The setup
is ignorant to a machine's hardware and blindly relies on the
configurations passed at start up.

I usually set M = No. of CPUs + 1, and R = Prime nearest No. of CPUs.
But needs may vary depending on the nature of jobs it is going to
perform; sometimes you may need lesser CPU but more Memory/Task, so
configure based on your application knowledge.

-- 
Harsh J
www.harshj.com

Re: Hadoop - how exactly is a slot defined

Posted by Grandl Robert <rg...@yahoo.com>.

Thanks all for your comments.

However, I still have some doubts. 

Basically I can control the number of map/reduce slots with
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum 

but, it is possible to set different number of map/reduce slots for different slaves ?

For example If I am running in a heterogeneous environment, where each slave have different configuration, it is possible to set different number of slots based on the specific machine configurations ? 
For the moment I observed that I can modify only on the master this parameters, therefore all the nodes will run with same number of map/reduce slots careless of whatever resources(CPU,MEMORY) offer each other. 

Thanks for any clue.

Robert

--- On Mon, 11/22/10, Harsh J <qw...@gmail.com> wrote:

From: Harsh J <qw...@gmail.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: general@hadoop.apache.org
Date: Monday, November 22, 2010, 6:52 PM

Hi,

On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power.
>
> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
example.

>
> Thanks a lot,
> Robert
>
>
>

-- 
Harsh J
www.harshj.com

Re: Hadoop - how exactly is a slot defined

Posted by Steve Loughran <st...@apache.org>.

On 25/11/10 12:42, Grandl Robert wrote:
> Thanks to you all for the explanations.
> So, as far as I understand, if I configure 4 map slots per node(let's say - 512 MB RAM per slot as my node has 2 GB in total) the hadoop will always try to allocate 4 slots ?  Does the node report on the hearbteat that it has 4 free slots ?

> But then, my question comes: what if another workload contend with hadoop workload at a moment, that means few resources available now for hadoop. Did hadoop still report he has 4 slots free and implicitly try to allocate tasks for these 4 slots ?

Do you mean other system workload? Are your machines accepting work from 
other places?

The JobTracker will push out work to the nodes, and remember which 
machines it has given work to -so won't overcommit them. If you are 
doing other work on the same machine, it won't know about that, and 
still push out jobs that will now take longer.

-steve

Re: Hadoop - how exactly is a slot defined

Posted by Grandl Robert <rg...@yahoo.com>.

Thanks to you all for the explanations.
So, as far as I understand, if I configure 4 map slots per node(let's say - 512 MB RAM per slot as my node has 2 GB in total) the hadoop will always try to allocate 4 slots ?  Does the node report on the hearbteat that it has 4 free slots ? 
But then, my question comes: what if another workload contend with hadoop workload at a moment, that means few resources available now for hadoop. Did hadoop still report he has 4 slots free and implicitly try to allocate tasks for these 4 slots ?
Thank you again for your promptly answers.
Cheers,Robert
--- On Wed, 11/24/10, Jonathan Creasy <jo...@Announcemedia.com> wrote:

From: Jonathan Creasy <jo...@Announcemedia.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: "general@hadoop.apache.org" <ge...@hadoop.apache.org>
Date: Wednesday, November 24, 2010, 7:04 PM

Robert, 

Hadoop is not currently doing any dynamic detection of resources to determine the number of slots. If I told Hadoop it could run 3,587 map tasks, it might well try to do it. 

We use standards to determine how many map and reduce tasks a node is allowed:

Each Map/Reduce Task is given:
2GB of Ram
1 Core
50GB of tmp disk space

The formula for map/reduce slots looks something like this in our environment:

G = GB of Ram
D = Disk space in /tmp
C = count of CPU cores

The minimum of: 
(G-2)/2
D/50
C-1

These numbers aren't published anywhere and may completely fly in the face of conventional wisdom but it's what we are using and so far, seems to work for us. 

-Jonathan


On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote:

> Hi,
> I am sorry bothering again about this subject, but still I am not very convinced what Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate corresponding numbers of map/reduce slots based on your configurations.
> BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical resources itself, or are just some numbers and you can go over only with this numbers. 
> I looked on the code, but I am not able to figure out if Hadoop really did some checking between number of slots and physical resources, or just is limited by the 2 numbers(for maximum number of map slots and reduce slots) and play with this numbers only. That means, the user should give more interpretation of what a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when configure the number of map/reduce slots on his machines.
> Thanks in advance for any clue.
> Cheers,Robert
> 
> --- On Mon, 11/22/10, Harsh J <qw...@gmail.com> wrote:
> 
> From: Harsh J <qw...@gmail.com>
> Subject: Re: Hadoop - how exactly is a slot defined
> To: general@hadoop.apache.org
> Date: Monday, November 22, 2010, 6:52 PM
> 
> Hi,
> 
> On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rg...@yahoo.com> wrote:
>> Hi all,
>> 
>> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power.
>> 
>> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?
> 
> A slot is of two types -- Map slot and Reduce slot. A slot represents
> an ability to run one of these "Tasks" (map/reduce tasks) individually
> at a point of time. Therefore, multiple slots on a TaskTracker means
> multiple "Tasks" may execute in parallel.
> 
> Right now total slots in a TaskTracker is ==
> mapred.tasktracker.map.tasks.maximum for Maps and
> mapred.tasktracker.reduce.tasks.maximum for Reduces.
> 
> Hadoop is indeed trying to go towards the dynamic slot concept, which
> could rely on the current resources available on a system, but work
> for this is still in conceptual phases. TaskTrackers emit system
> status (like CPU load, utilization, memory available/user, load
> averages) in their heartbeats today (and is utilized by certain
> schedulers, I think Capacity Scheduler uses it to determine stuff),
> but the concept of slots is still fixed as a maximum to the above two
> configurations on each TaskTracker.
> 
> For code on how slots are checked/utilized, see any Scheduler plugin's
> code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
> example.
> 
>> 
>> Thanks a lot,
>> Robert
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
> 
> 
>

Re: Hadoop - how exactly is a slot defined

Posted by Jonathan Creasy <jo...@Announcemedia.com>.

Robert, 

Hadoop is not currently doing any dynamic detection of resources to determine the number of slots. If I told Hadoop it could run 3,587 map tasks, it might well try to do it. 

We use standards to determine how many map and reduce tasks a node is allowed:

Each Map/Reduce Task is given:
2GB of Ram
1 Core
50GB of tmp disk space

The formula for map/reduce slots looks something like this in our environment:

G = GB of Ram
D = Disk space in /tmp
C = count of CPU cores

The minimum of: 
(G-2)/2
D/50
C-1

These numbers aren't published anywhere and may completely fly in the face of conventional wisdom but it's what we are using and so far, seems to work for us. 

-Jonathan


On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote:

> Hi,
> I am sorry bothering again about this subject, but still I am not very convinced what Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate corresponding numbers of map/reduce slots based on your configurations.
> BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical resources itself, or are just some numbers and you can go over only with this numbers. 
> I looked on the code, but I am not able to figure out if Hadoop really did some checking between number of slots and physical resources, or just is limited by the 2 numbers(for maximum number of map slots and reduce slots) and play with this numbers only. That means, the user should give more interpretation of what a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when configure the number of map/reduce slots on his machines.
> Thanks in advance for any clue.
> Cheers,Robert
> 
> --- On Mon, 11/22/10, Harsh J <qw...@gmail.com> wrote:
> 
> From: Harsh J <qw...@gmail.com>
> Subject: Re: Hadoop - how exactly is a slot defined
> To: general@hadoop.apache.org
> Date: Monday, November 22, 2010, 6:52 PM
> 
> Hi,
> 
> On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rg...@yahoo.com> wrote:
>> Hi all,
>> 
>> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power.
>> 
>> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?
> 
> A slot is of two types -- Map slot and Reduce slot. A slot represents
> an ability to run one of these "Tasks" (map/reduce tasks) individually
> at a point of time. Therefore, multiple slots on a TaskTracker means
> multiple "Tasks" may execute in parallel.
> 
> Right now total slots in a TaskTracker is ==
> mapred.tasktracker.map.tasks.maximum for Maps and
> mapred.tasktracker.reduce.tasks.maximum for Reduces.
> 
> Hadoop is indeed trying to go towards the dynamic slot concept, which
> could rely on the current resources available on a system, but work
> for this is still in conceptual phases. TaskTrackers emit system
> status (like CPU load, utilization, memory available/user, load
> averages) in their heartbeats today (and is utilized by certain
> schedulers, I think Capacity Scheduler uses it to determine stuff),
> but the concept of slots is still fixed as a maximum to the above two
> configurations on each TaskTracker.
> 
> For code on how slots are checked/utilized, see any Scheduler plugin's
> code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
> example.
> 
>> 
>> Thanks a lot,
>> Robert
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
> 
> 
>

Re: Hadoop - how exactly is a slot defined

Posted by Grandl Robert <rg...@yahoo.com>.

Hi,
I am sorry bothering again about this subject, but still I am not very convinced what Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate corresponding numbers of map/reduce slots based on your configurations.
BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical resources itself, or are just some numbers and you can go over only with this numbers. 
I looked on the code, but I am not able to figure out if Hadoop really did some checking between number of slots and physical resources, or just is limited by the 2 numbers(for maximum number of map slots and reduce slots) and play with this numbers only. That means, the user should give more interpretation of what a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when configure the number of map/reduce slots on his machines.
Thanks in advance for any clue.
Cheers,Robert

--- On Mon, 11/22/10, Harsh J <qw...@gmail.com> wrote:

From: Harsh J <qw...@gmail.com>
Subject: Re: Hadoop - how exactly is a slot defined
To: general@hadoop.apache.org
Date: Monday, November 22, 2010, 6:52 PM

Hi,

On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power.
>
> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
example.

>
> Thanks a lot,
> Robert
>
>
>

-- 
Harsh J
www.harshj.com

Re: Hadoop - how exactly is a slot defined

Posted by Harsh J <qw...@gmail.com>.

Hi,

On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <rg...@yahoo.com> wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are talking about tasks assigned to slots, but I did not found anywhere what exactly a slot is. I assume it represent some allocation of RAM memory as well as with some computation power.
>
> However, can somebody explain me what exactly a slot means (in terms of resources allocated for a slot) and how this mapping(between slot and physical resources) is done in Hadoop ? Or give me some hints about the files in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
example.

>
> Thanks a lot,
> Robert
>
>
>

-- 
Harsh J
www.harshj.com