You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Allen Wittenauer (JIRA)" <ji...@apache.org> on 2014/07/18 20:00:18 UTC

[jira] [Resolved] (MAPREDUCE-501) Spawning tasks faster

     [ https://issues.apache.org/jira/browse/MAPREDUCE-501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Allen Wittenauer resolved MAPREDUCE-501.
----------------------------------------

    Resolution: Incomplete

Closing as stale.

> Spawning tasks faster
> ---------------------
>
>                 Key: MAPREDUCE-501
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-501
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Spyros Blanas
>            Priority: Minor
>         Attachments: dynamic_heartbeat.patch
>
>
> In the current implementation, tasks are assigned to tasktrackers by adding an appropriate action to the heartbeat response list. Each heartbeat response can start one task. As the minimum interval between heartbeats is 5 sec (by default), if the nodes are strong machines (say, each node has 10 task "slots") and the cluster is idle, this means that some tasks are spawned after some time (in our example, the last task will be spawned after 45 seconds).
> This can be significantly improve the end-to-end execution time if most jobs are finished in the order of minutes.
> The patch I attach requests from each TaskTracker to reply in 1/5th of the regular heartbeat interval time if it was assigned a task in this round, making spawning of multiple tasks much more efficient.
> A better approach would be to have each TaskTracker report the number of free slots it has (instead of only if it can accept more work or not) and have the JobTracker push the appropriate number of tasks in one response, but this will require changes in the current communication protocol.



--
This message was sent by Atlassian JIRA
(v6.2#6252)