You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2006/04/18 23:56:19 UTC
[jira] Updated: (HADOOP-142) failed tasks should be rescheduled on
different hosts after other jobs
[ http://issues.apache.org/jira/browse/HADOOP-142?page=all ]
Owen O'Malley updated HADOOP-142:
---------------------------------
Attachment: no-repeat-failures.patch
This patch does three things:
1. When a task fails, it sets the following task to be the first to be checked for assignment to a TaskTracker.
2. Tasks prefer not to run on TaskTrackers where they have failed before.
3. Speculative tasks will not run on TaskTrackers where they have failed.
> failed tasks should be rescheduled on different hosts after other jobs
> ----------------------------------------------------------------------
>
> Key: HADOOP-142
> URL: http://issues.apache.org/jira/browse/HADOOP-142
> Project: Hadoop
> Type: Improvement
> Components: mapred
> Versions: 0.1.1
> Reporter: Owen O'Malley
> Assignee: Owen O'Malley
> Fix For: 0.2
> Attachments: no-repeat-failures.patch
>
> Currently when tasks fail, they are usually rerun immediately on the same host. This causes problems in a couple of ways.
> 1.The task is more likely to fail on the same host.
> 2.If there is cleanup code (such as clearing pendingCreates) it does not always run immediately, leading to cascading failures.
> For a first pass, I propose that when a task fails, we start the scan for new tasks to launch at the following task of the same type (within that job). So if maps[99] fails, when we are looking to assign new map tasks from this job, we scan like maps[100]...maps[N], maps[0]..,maps[99].
> A more involved change would avoid running tasks on nodes where it has failed before. This is a little tricky, because you don't want to prevent re-excution of tasks on 1 node clusters and the job tracker needs to schedule one task tracker at a time.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira