You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Wan <ji...@indeed.com> on 2008/03/19 00:41:46 UTC

Limiting Total # of TaskTracker threads

The properties mentioned here: http://wiki.apache.org/hadoop/FAQ#13

have been deprecated in favor of two separate properties:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum

I'd like to limit the total # of threads on a task tracker (think limited  
resources on a given compute node) to a given number, and there does not  
appear to be a way to do that anymore. Am I correct in my understanding  
that there is no capability to do this?

-- 
Jimmy

Re: Limiting Total # of TaskTracker threads

Posted by Khalil Honsali <k....@gmail.com>.
Hi,

>The map/reduce tasks are not threads, they are run in separate JVMs
which are forked by the tasktracker.

I don't understand why? is it a design to support task failures? I think
that on the other hand running a thread queue (of tasks) per job per JVM
would grealy improve performance, since fewer JVM init times.

K. Honsali

On 21/03/2008, Jimmy Wan <ji...@indeed.com> wrote:
>
> On Tue, 18 Mar 2008 19:53:04 -0500, Ted Dunning <td...@veoh.com> wrote:
>
> > I think the original request was to limit the sum of maps and reduces
> > rather than limiting the two parameters independently.
>
>
> Ted, yes this is exactly what I'm looking for. I just found an issue that
> seems to state that the old deprecated property is there, but it is not
> documented:
>
> https://issues.apache.org/jira/browse/HADOOP-2300
>
> I tried using the max tasks in combination with setting the new values,
> but that didn't seem to work. =( My machine labelled as "LIMITED MACHINE"
> had 2 maps and 1 reduce running at the same time.
>
> The scenario I have is that I want to run multiple concurrent jobs through
> my cluster and have the CPU usage for that node be bound. Should I file a
> new issue?
>
> This was all with Hadoop 0.16.0
>
> LIMITED MACHINE:
>         <property>
>           <name>mapred.tasktracker.tasks.maximum</name>
>           <value>2</value>
>           <description>The maximum number of total tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>         <property>
>           <name>mapred.tasktracker.map.tasks.maximum</name>
>           <value>1</value>
>           <description>The maximum number of map tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>         <property>
>           <name>mapred.tasktracker.reduce.tasks.maximum</name>
>           <value>1</value>
>           <description>The maximum number of reduce tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>
> OTHER CLUSTER MACHINES:
>         <property>
>           <name>mapred.tasktracker.tasks.maximum</name>
>           <value>8</value>
>           <description>The maximum number of total tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>         <property>
>           <name>mapred.tasktracker.map.tasks.maximum</name>
>           <value>4</value>
>           <description>The maximum number of map tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>         <property>
>           <name>mapred.tasktracker.reduce.tasks.maximum</name>
>           <value>4</value>
>           <description>The maximum number of reduce tasks that will be run
>           simultaneously by a task tracker.
>           </description>
>         </property>
>
>
> > On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:
> >
>
> >> The map/reduce tasks are not threads, they are run in separate JVMs
> >> which are forked by the tasktracker.
>
>
> Arun, yes, I did mean tasks, not threads.
>
>
> --
>
> Jimmy
>

Re: Limiting Total # of TaskTracker threads

Posted by Jimmy Wan <ji...@indeed.com>.
On Tue, 18 Mar 2008 19:53:04 -0500, Ted Dunning <td...@veoh.com> wrote:

> I think the original request was to limit the sum of maps and reduces  
> rather than limiting the two parameters independently.

Ted, yes this is exactly what I'm looking for. I just found an issue that  
seems to state that the old deprecated property is there, but it is not  
documented:

https://issues.apache.org/jira/browse/HADOOP-2300

I tried using the max tasks in combination with setting the new values,  
but that didn't seem to work. =( My machine labelled as "LIMITED MACHINE"  
had 2 maps and 1 reduce running at the same time.

The scenario I have is that I want to run multiple concurrent jobs through  
my cluster and have the CPU usage for that node be bound. Should I file a  
new issue?

This was all with Hadoop 0.16.0

LIMITED MACHINE:
	<property>
	  <name>mapred.tasktracker.tasks.maximum</name>
	  <value>2</value>
	  <description>The maximum number of total tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>
	<property>
	  <name>mapred.tasktracker.map.tasks.maximum</name>
	  <value>1</value>
	  <description>The maximum number of map tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>
	<property>
	  <name>mapred.tasktracker.reduce.tasks.maximum</name>
	  <value>1</value>
	  <description>The maximum number of reduce tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>

OTHER CLUSTER MACHINES:
	<property>
	  <name>mapred.tasktracker.tasks.maximum</name>
	  <value>8</value>
	  <description>The maximum number of total tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>
	<property>
	  <name>mapred.tasktracker.map.tasks.maximum</name>
	  <value>4</value>
	  <description>The maximum number of map tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>
	<property>
	  <name>mapred.tasktracker.reduce.tasks.maximum</name>
	  <value>4</value>
	  <description>The maximum number of reduce tasks that will be run
	  simultaneously by a task tracker.
	  </description>
	</property>

> On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:
>
>> The map/reduce tasks are not threads, they are run in separate JVMs
>> which are forked by the tasktracker.

Arun, yes, I did mean tasks, not threads.


-- 
Jimmy

Re: Limiting Total # of TaskTracker threads

Posted by Ted Dunning <td...@veoh.com>.
I think the original request was to limit the sum of maps and reduces rather
than limiting the two parameters independently.

Clearly, with a single job running at a time, this is a non-issue since
reducers don't do much until the maps are done.  With multiple jobs it is a
bit more of an issue.


On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:

>> I'd like to limit the total # of threads on a task tracker (think
>> limited resources on a given compute node) to a given number, and
>> there does not appear to be a way to do that anymore. Am I correct
>> in my understanding that there is no capability to do this?
>> 
> 
> The map/reduce tasks are not threads, they are run in separate JVMs
> which are forked by the tasktracker.


Re: Limiting Total # of TaskTracker threads

Posted by Arun C Murthy <ar...@yahoo-inc.com>.
On Mar 18, 2008, at 4:41 PM, Jimmy Wan wrote:

> The properties mentioned here: http://wiki.apache.org/hadoop/FAQ#13
>
> have been deprecated in favor of two separate properties:
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>

I've updated the wiki to reflect those... sorry you got mislead.

> I'd like to limit the total # of threads on a task tracker (think  
> limited resources on a given compute node) to a given number, and  
> there does not appear to be a way to do that anymore. Am I correct  
> in my understanding that there is no capability to do this?
>

The map/reduce tasks are not threads, they are run in separate JVMs  
which are forked by the tasktracker.

OTOH, there are other threads (RPC etc.) - are you looking at  
limiting those?

Arun

> -- 
> Jimmy