You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jeff Bean (JIRA)" <ji...@apache.org> on 2011/08/29 19:34:38 UTC

[jira] [Commented] (MAPREDUCE-2905) Allow mapred.fairscheduler.assignmultple to be set per job

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13092995#comment-13092995 ] 

Jeff Bean commented on MAPREDUCE-2905:
--------------------------------------

Example:

5 node cluster, 5 slots per node.

Set assign multiple to true.

Submit a sleep job with 7 mappers.

All 7 mappers are assigned to nodes A and B.

Set assign multiple to false. Bounce job tracker.

Submit a sleep job with 7 mappers. 

7 mappers now spread evenly across cluster.

I should be able to submit 2 sleep jobs, maybe job 1 has 7 mappers, job 2 has 700 mappers. I want to set assignmultiple on job 2 so that evey node saturates quickly, while job 1 has assignmultiple set to false, which means it gets one task assigned at a time and therefore spreads out more evently.

> Allow mapred.fairscheduler.assignmultple to be set per job
> ----------------------------------------------------------
>
>                 Key: MAPREDUCE-2905
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2905
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/fair-share
>    Affects Versions: 0.20.2
>            Reporter: Jeff Bean
>
> We encountered a situation where in the same cluster, large jobs benefit from mapred.fairscheduler.assignmultiple, but small jobs with small numbers of mappers do not: the mappers all clump to fully occupy just a few nodes, which causes those nodes to saturate and bottleneck. The desired behavior is to spread the job across more nodes so that a relatively small job doesn't saturate any node in the cluster.
> Testing has shown that setting mapred.fairscheduler.assignmultiple to false gives the desired behavior for small jobs, but is unnecessary for large jobs. However, since this is a cluster-wide setting, we can't properly tune.
> It'd be nice if jobs can set a param similar to mapred.fairscheduler.assignmultiple on submission to better control the task distribution of a particular job.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira