You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@hadoop.apache.org by Michael Moores <mm...@real.com> on 2010/10/21 00:41:31 UTC

Limiting concurrent maps

I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce the load
on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat).  I'd like to find ways of throttling the map operations
in the case I may be affecting OLTP activity on the cluster.

What parameters can I use to limit the number of map tasks running concurrently across the whole cluster?  mapreduce.tasktracker.map.tasks.maximum 
limits the number of concurrent maps per task tracker.  But can i do this at the job level? 

Should I look at the "fair" scheduler?

regards,Michael

Re: Limiting concurrent maps

Posted by Arun C Murthy <ac...@yahoo-inc.com>.
On Oct 21, 2010, at 5:30 PM, Michael Moores wrote:

> I don't see how the capacity scheduler could limit the number of  
> maps running concurrently across  the whole cluster,
> even if this is the only job running.
>

Easy, set a maximum limit on the queue.

Arun


Re: Limiting concurrent maps

Posted by Michael Moores <mm...@real.com>.
I don't see how the capacity scheduler could limit the number of maps running concurrently across  the whole cluster,
even if this is the only job running.

but, maybe with the fair scheduler mapred.fairscheduler.loadmanager extension point:


mapred.fairscheduler.loadmanager        An extensibility point that lets you specify a class that determines how many maps and reduces can run on a given TaskTracker. This class should implement the LoadManager interface. By default the task caps in the Hadoop config file are used, but this option could be used to make the load based on available memory and CPU utilization for example.


On Oct 20, 2010, at 4:32 PM, james warren wrote:

Hi Michael,

Any of the tasktracker configs affect the local tasktracker daemon and not
other servers in your cluster.  Moreover, they can't be overridden by a job
configuration.  Sounds like you're in need of a job scheduler; I personally
prefer use the Fair Scheduler but I'm sure the Capacity Scheduler would suit
your needs as well.

cheers,
-James

On Wed, Oct 20, 2010 at 3:41 PM, Michael Moores <mm...@real.com>> wrote:

I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce
the load
on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat).  I'd
like to find ways of throttling the map operations
in the case I may be affecting OLTP activity on the cluster.

What parameters can I use to limit the number of map tasks running
concurrently across the whole cluster?
mapreduce.tasktracker.map.tasks.maximum
limits the number of concurrent maps per task tracker.  But can i do this
at the job level?

Should I look at the "fair" scheduler?

regards,Michael


Re: Limiting concurrent maps

Posted by james warren <ja...@rockyou.com>.
Hi Michael,

Any of the tasktracker configs affect the local tasktracker daemon and not
other servers in your cluster.  Moreover, they can't be overridden by a job
configuration.  Sounds like you're in need of a job scheduler; I personally
prefer use the Fair Scheduler but I'm sure the Capacity Scheduler would suit
your needs as well.

cheers,
-James

On Wed, Oct 20, 2010 at 3:41 PM, Michael Moores <mm...@real.com> wrote:

> I have been playing with mapreduce.tasktracker.map.tasks.maximum to reduce
> the load
> on my Cassandra cluster (using the Cassandra ColumnFamilyInputFormat).  I'd
> like to find ways of throttling the map operations
> in the case I may be affecting OLTP activity on the cluster.
>
> What parameters can I use to limit the number of map tasks running
> concurrently across the whole cluster?
>  mapreduce.tasktracker.map.tasks.maximum
> limits the number of concurrent maps per task tracker.  But can i do this
> at the job level?
>
> Should I look at the "fair" scheduler?
>
> regards,Michael