You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-user@hadoop.apache.org by Markus Jelsma <ma...@openindex.io> on 2011/12/20 00:02:39 UTC

Variable mapreduce.tasktracker.*.tasks.maximum per job

Hi,

We have many different jobs running on a 0.22.0 cluster, each with its own 
memory consumption. Some jobs can easily be run with a large amount of *.tasks 
per job and others require much more memory and can only be run with a minimum 
number of tasks per node.

Is there any way to reconfigure a running cluster on a per job basis so we can 
set the heap size and number of mapper and reduce tasks per node? If not, we 
have to force all settings to a level that is right for the toughest jobs 
which will have a negative impact on simpler jobs.

Thoughts?
Thanks

Re: Variable mapreduce.tasktracker.*.tasks.maximum per job

Posted by Markus Jelsma <ma...@openindex.io>.

Thanks! I'll look into it. 

On Tuesday 20 December 2011 01:31:17 Arun C Murthy wrote:
> Markus,
> 
> The CapacityScheduler in 0.20.205 (in fact since 0.20.203) supports the
> notion of 'high memory jobs' with which you can specify, for each job, the
> number of 'slots' for each map/reduce. For e.g. you can say for job1 that
> each map needs 2 slots and so on.
> 
> Unfortunately, I don't know how well this works in 0.22 - I might be wrong,
> but I heavily doubt it's been tested in 0.22. YMMV.
> 
> Hope that helps.
> 
> Arun
> 
> On Dec 19, 2011, at 3:02 PM, Markus Jelsma wrote:
> > Hi,
> > We have many different jobs running on a 0.22.0 cluster, each with its
> > own memory consumption. Some jobs can easily be run with a large amount
> > of *.tasks per job and others require much more memory and can only be
> > run with a minimum number of tasks per node. Is there any way to
> > reconfigure a running cluster on a per job basis so we can set the heap
> > size and number of mapper and reduce tasks per node? If not, we have to
> > force all settings to a level that is right for the toughest jobs which
> > will have a negative impact on simpler jobs. Thoughts?
> > Thanks

Re: Variable mapreduce.tasktracker.*.tasks.maximum per job

Posted by Arun C Murthy <ac...@hortonworks.com>.

Markus,

The CapacityScheduler in 0.20.205 (in fact since 0.20.203) supports the notion of 'high memory jobs' with which you can specify, for each job, the number of 'slots' for each map/reduce. For e.g. you can say for job1 that each map needs 2 slots and so on.

Unfortunately, I don't know how well this works in 0.22 - I might be wrong, but I heavily doubt it's been tested in 0.22. YMMV.

Hope that helps.

Arun

On Dec 19, 2011, at 3:02 PM, Markus Jelsma wrote:

> Hi,
> We have many different jobs running on a 0.22.0 cluster, each with its own memory consumption. Some jobs can easily be run with a large amount of *.tasks per job and others require much more memory and can only be run with a minimum number of tasks per node.
> Is there any way to reconfigure a running cluster on a per job basis so we can set the heap size and number of mapper and reduce tasks per node? If not, we have to force all settings to a level that is right for the toughest jobs which will have a negative impact on simpler jobs.
> Thoughts?
> Thanks