You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amareshwari Sriramadasu (JIRA)" <ji...@apache.org> on 2008/10/21 12:28:44 UTC

[jira] Issue Comment Edited: (HADOOP-4305) repeatedly blacklisted tasktrackers should get declared dead

    [ https://issues.apache.org/jira/browse/HADOOP-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12641330#action_12641330 ] 

amareshwari edited comment on HADOOP-4305 at 10/21/08 3:26 AM:
---------------------------------------------------------------------------

I propose the following for declaring tasktrackers dead:
    * When to declare a task tracker dead? :
       If the number of times the task tracker got blacklisted is greater than or equal to *mapred.max.tasktracker.blacklists*, then the job tracker declares the task tracker as dead.
    * What to do with the dead task tracker? :
         ** Option 1:
           *** send DisallowedTaskTrackerException to the task tracker in the heartbeat. Then task tracker shuts down.  
           *** Kill the tasks running on dead tracker and re-execute them.  (Similar behavior as lost task tracker)
        ** Option 2:
           *** Honor the tasks running the dead tracker also. But do not schedule any more jobs or tasks on it. (Same behavior as black listed tracker, but it is across all the jobs).
    * How to bring back the dead tracker? :
         ** With Option 1, restarting the task tracker adds it back to the cluster.
         ** With Option 2, There should be an admin command which removes the tracker from deadList.
   
I think we should go with *Option 1*, because it is actually making it die and come up. Thoughts?

Irrespective of all this,  Map/Reduce should have a utility similar to *dfsadmin -refreshNodes* , to add and delete trackers to cluster anytime.
    

      was (Author: amareshwari):
    I propose the following for declaring tasktrackers dead:
    * When to declare a task tracker dead? :
       If the number of times the task tracker got blacklisted is greater than or equal to *mapred.max.tasktracker.blacklists*, then the job tracker declares the task tracker as dead.
    * What to do with the dead task tracker? :
         ** Option 1:
           *** send DisallowedTaskTrackerException to the task tracker in the heartbeat. Then task tracker shuts down.  
           *** Kill the tasks running on dead tracker and re-execute them.  (Similar behavior as lost task tracker)
        ** Option 2:
           *** Honor the tasks running the dead tracker also. But do not schedule any more jobs or tasks on it. (Same behavior as black listed tracker, but it is across all the jobs).
    * How to bring back the dead tracker? :
         ** With Option 1, restarting the task tracker adds it back to the cluster.
         ** With Option 2, There should be an admin command which removes a tracker from deadList to LiveList.
   
I think we should go with *Option 1*, because it is actually making it die and come up. Thoughts?

Irrespective of all this,  Map/Reduce should have a utility similar to *dfsadmin -refreshNodes* , to add and delete trackers to cluster anytime.
    
  
> repeatedly blacklisted tasktrackers should get declared dead
> ------------------------------------------------------------
>
>                 Key: HADOOP-4305
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4305
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Christian Kunz
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.20.0
>
>
> When running a batch of jobs it often happens that the same tasktrackers are blacklisted again and again. This can slow job execution considerably, in particular, when tasks fail because of timeout.
> It would make sense to no longer assign any tasks to such tasktrackers and to declare them dead.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.