You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jimmy Wan <ji...@indeed.com> on 2008/03/19 00:41:46 UTC
Limiting Total # of TaskTracker threads
The properties mentioned here: http://wiki.apache.org/hadoop/FAQ#13
have been deprecated in favor of two separate properties:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
I'd like to limit the total # of threads on a task tracker (think limited
resources on a given compute node) to a given number, and there does not
appear to be a way to do that anymore. Am I correct in my understanding
that there is no capability to do this?
--
Jimmy
Re: Limiting Total # of TaskTracker threads
Posted by Khalil Honsali <k....@gmail.com>.
Hi,
>The map/reduce tasks are not threads, they are run in separate JVMs
which are forked by the tasktracker.
I don't understand why? is it a design to support task failures? I think
that on the other hand running a thread queue (of tasks) per job per JVM
would grealy improve performance, since fewer JVM init times.
K. Honsali
On 21/03/2008, Jimmy Wan <ji...@indeed.com> wrote:
>
> On Tue, 18 Mar 2008 19:53:04 -0500, Ted Dunning <td...@veoh.com> wrote:
>
> > I think the original request was to limit the sum of maps and reduces
> > rather than limiting the two parameters independently.
>
>
> Ted, yes this is exactly what I'm looking for. I just found an issue that
> seems to state that the old deprecated property is there, but it is not
> documented:
>
> https://issues.apache.org/jira/browse/HADOOP-2300
>
> I tried using the max tasks in combination with setting the new values,
> but that didn't seem to work. =( My machine labelled as "LIMITED MACHINE"
> had 2 maps and 1 reduce running at the same time.
>
> The scenario I have is that I want to run multiple concurrent jobs through
> my cluster and have the CPU usage for that node be bound. Should I file a
> new issue?
>
> This was all with Hadoop 0.16.0
>
> LIMITED MACHINE:
> <property>
> <name>mapred.tasktracker.tasks.maximum</name>
> <value>2</value>
> <description>The maximum number of total tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>1</value>
> <description>The maximum number of map tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>1</value>
> <description>The maximum number of reduce tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
>
> OTHER CLUSTER MACHINES:
> <property>
> <name>mapred.tasktracker.tasks.maximum</name>
> <value>8</value>
> <description>The maximum number of total tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
> <property>
> <name>mapred.tasktracker.map.tasks.maximum</name>
> <value>4</value>
> <description>The maximum number of map tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
> <property>
> <name>mapred.tasktracker.reduce.tasks.maximum</name>
> <value>4</value>
> <description>The maximum number of reduce tasks that will be run
> simultaneously by a task tracker.
> </description>
> </property>
>
>
> > On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:
> >
>
> >> The map/reduce tasks are not threads, they are run in separate JVMs
> >> which are forked by the tasktracker.
>
>
> Arun, yes, I did mean tasks, not threads.
>
>
> --
>
> Jimmy
>
Re: Limiting Total # of TaskTracker threads
Posted by Jimmy Wan <ji...@indeed.com>.
On Tue, 18 Mar 2008 19:53:04 -0500, Ted Dunning <td...@veoh.com> wrote:
> I think the original request was to limit the sum of maps and reduces
> rather than limiting the two parameters independently.
Ted, yes this is exactly what I'm looking for. I just found an issue that
seems to state that the old deprecated property is there, but it is not
documented:
https://issues.apache.org/jira/browse/HADOOP-2300
I tried using the max tasks in combination with setting the new values,
but that didn't seem to work. =( My machine labelled as "LIMITED MACHINE"
had 2 maps and 1 reduce running at the same time.
The scenario I have is that I want to run multiple concurrent jobs through
my cluster and have the CPU usage for that node be bound. Should I file a
new issue?
This was all with Hadoop 0.16.0
LIMITED MACHINE:
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>2</value>
<description>The maximum number of total tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>1</value>
<description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
</description>
</property>
OTHER CLUSTER MACHINES:
<property>
<name>mapred.tasktracker.tasks.maximum</name>
<value>8</value>
<description>The maximum number of total tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>4</value>
<description>The maximum number of map tasks that will be run
simultaneously by a task tracker.
</description>
</property>
<property>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>4</value>
<description>The maximum number of reduce tasks that will be run
simultaneously by a task tracker.
</description>
</property>
> On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:
>
>> The map/reduce tasks are not threads, they are run in separate JVMs
>> which are forked by the tasktracker.
Arun, yes, I did mean tasks, not threads.
--
Jimmy
Re: Limiting Total # of TaskTracker threads
Posted by Ted Dunning <td...@veoh.com>.
I think the original request was to limit the sum of maps and reduces rather
than limiting the two parameters independently.
Clearly, with a single job running at a time, this is a non-issue since
reducers don't do much until the maps are done. With multiple jobs it is a
bit more of an issue.
On 3/18/08 5:26 PM, "Arun C Murthy" <ar...@yahoo-inc.com> wrote:
>> I'd like to limit the total # of threads on a task tracker (think
>> limited resources on a given compute node) to a given number, and
>> there does not appear to be a way to do that anymore. Am I correct
>> in my understanding that there is no capability to do this?
>>
>
> The map/reduce tasks are not threads, they are run in separate JVMs
> which are forked by the tasktracker.
Re: Limiting Total # of TaskTracker threads
Posted by Arun C Murthy <ar...@yahoo-inc.com>.
On Mar 18, 2008, at 4:41 PM, Jimmy Wan wrote:
> The properties mentioned here: http://wiki.apache.org/hadoop/FAQ#13
>
> have been deprecated in favor of two separate properties:
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
I've updated the wiki to reflect those... sorry you got mislead.
> I'd like to limit the total # of threads on a task tracker (think
> limited resources on a given compute node) to a given number, and
> there does not appear to be a way to do that anymore. Am I correct
> in my understanding that there is no capability to do this?
>
The map/reduce tasks are not threads, they are run in separate JVMs
which are forked by the tasktracker.
OTOH, there are other threads (RPC etc.) - are you looking at
limiting those?
Arun
> --
> Jimmy