You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Johannes Zillmann (JIRA)" <ji...@apache.org> on 2010/08/28 23:36:54 UTC

[jira] Commented: (MAPREDUCE-1859) maxConcurrentMapTask & maxConcurrentReduceTask per job

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903891#action_12903891 ] 

Johannes Zillmann commented on MAPREDUCE-1859:
----------------------------------------------

The Capacity scheduler solution does not seem to be flexible engough for cases where you have different kind of input source and different configurations of input source and all these kinds and configurations are not known at cluster startup. 
If you have a system where a user can setup an import from a database the limits they might want to put on that import can be very different cause one imports something from a mysql-db, one from oracle, one from a clustered db, one from a db wich is in other use as well, etc.... 

> maxConcurrentMapTask & maxConcurrentReduceTask per job
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-1859
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1859
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: job submission
>    Affects Versions: 0.20.2
>            Reporter: Johannes Zillmann
>
> It would be valuable if one could specify the max number of map/reduce slots which should be used for a given job. An example would be an map-reduce job importing from a database where you don't want 50 map tasks querying one db at a time but also you don't want to shrink the overall map task count.
> Also this is probably already possible through Fair/Capacity-Scheduler or an own Extension i think it would be a good addition for the default TaskScheduler since this seems to be more then a rare used feature.
> This would have the benefit in situations where you don't have control/ownership over the cluster as well. 
> And its more job-centric whereas the existing scheduler extensions seems to be more job-type-centric.
> Implementing this feature should be relatively straightforward. Adding something like jobConf.setMaxConcurrentMapTask(int) and respecting this configuration in JobQueueTaskScheduler.
> Not sure if this feature would be harmonical with the existing Fair/Capacity-Schedulers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.