You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/04/18 23:56:19 UTC

[jira] Updated: (HADOOP-142) failed tasks should be rescheduled on different hosts after other jobs

     [ http://issues.apache.org/jira/browse/HADOOP-142?page=all ]

Owen O'Malley updated HADOOP-142:
---------------------------------

    Attachment: no-repeat-failures.patch

This patch does three things:
   1. When a task fails, it sets the following task to be the first to be checked for assignment to a TaskTracker.
   2. Tasks prefer not to run on TaskTrackers where they have failed before.
   3.  Speculative tasks will not run on TaskTrackers where they have failed.

> failed tasks should be rescheduled on different hosts after other jobs
> ----------------------------------------------------------------------
>
>          Key: HADOOP-142
>          URL: http://issues.apache.org/jira/browse/HADOOP-142
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.1.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.2
>  Attachments: no-repeat-failures.patch
>
> Currently when tasks fail, they are usually rerun immediately on the same host. This causes problems in a couple of ways. 
>   1.The task is more likely to fail on the same host. 
>   2.If there is cleanup code (such as clearing pendingCreates) it does not always run immediately, leading to cascading failures.
> For a first pass, I propose that when a task fails, we start the scan for new tasks to launch at the following task of the same type (within that job). So if maps[99] fails, when we are looking to assign new map tasks from this job, we scan like maps[100]...maps[N], maps[0]..,maps[99].
> A more involved change would avoid running tasks on nodes where it has failed before. This is a little tricky, because you don't want to prevent re-excution of tasks on 1 node clusters and the job tracker needs to schedule one task tracker at a time.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira