You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jason Lowe (JIRA)" <ji...@apache.org> on 2014/11/07 17:49:35 UTC

[jira] [Updated] (MAPREDUCE-5583) Ability to limit running map and reduce tasks

     [ https://issues.apache.org/jira/browse/MAPREDUCE-5583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jason Lowe updated MAPREDUCE-5583:
----------------------------------
    Attachment: MAPREDUCE-5583v1.patch

Had an offline discussion about this with Arun, and he suggested using the ANY ask (i.e.: host="*") to act as a limit to the request.  YARN only schedules containers for an application as long as the ANY ask is non-zero, so sending a request for 100 hosts and 10 racks but an ANY ask of 1 will only return 1 container.  If the AM carefully modulates the ANY ask then it can self-limit without needing to give up telling the RM about all of its locality desires.

Attaching a patch that implements this approach.  It needs unit tests, but I've manually tested it and maps and reduces are being limited, accordingly.  The mapreduce.job.running.maps.limit and mapreduce.job.running.reduces.limit properties control it, where 0 (the default) means no limit otherwise it specifies the number of maps or reduces, respectively, that will be allowed to run concurrently.

Feedback appreciated.

> Ability to limit running map and reduce tasks
> ---------------------------------------------
>
>                 Key: MAPREDUCE-5583
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5583
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mr-am, mrv2
>    Affects Versions: 0.23.9, 2.1.1-beta
>            Reporter: Jason Lowe
>         Attachments: MAPREDUCE-5583v1.patch
>
>
> It would be nice if users could specify a limit to the number of map or reduce tasks that are running simultaneously.  Occasionally users are performing operations in tasks that can lead to DDoS scenarios if too many tasks run simultaneously (e.g.: accessing a database, web service, etc.).  Having the ability to throttle the number of tasks simultaneously running would provide users a way to mitigate issues with too many tasks on a large cluster attempting to access a serivce at any one time.
> This is similar to the functionality requested by MAPREDUCE-224 and implemented by HADOOP-3412 but was dropped in mrv2.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)