You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2009/04/19 08:40:47 UTC

[jira] Issue Comment Edited: (HADOOP-5632) Jobtracker leaves tasktrackers underutilized

    [ https://issues.apache.org/jira/browse/HADOOP-5632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700550#action_12700550 ] 

Devaraj Das edited comment on HADOOP-5632 at 4/18/09 11:40 PM:
---------------------------------------------------------------

If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we explicitly call those out as separate RPCs. Tasktrackers makes certain assumptions about a successful heartbeat, and since tasktrackers always sends a regular (heavyweight) heartbeat, there is a problem to do with status reporting for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s) fails just before sending the heartbeat. The tasktracker sends the status of those tasks, and the JobTracker processes this heartbeat as a lightweight one (thereby doesn't do the processing of status updates). The tasktracker removes these from the runningTasks map after getting the heartbeat response, and won't report the statuses of those tasks again. The JobTracker will be unaware of such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the lightweight heartbeat as well. The logic being failed/killed tasks should be given the same treatment as virgin tasks. It actually makes sense to give higher priority to failed tasks during task assignment since if there is a deterministic failure on every attempt, the job would fail fast (after a certain number of attempts of the same task), leading to better cluster utilization..

      was (Author: devaraj):
    If we go the route of lightweight/heavyweight heartbeat, I'd suggest that we explicitly call those out as separate RPCs. Tasktrackers makes certain assumptions about a successful heartbeat, and since tasktrackers always sends a regular (heavyweight) heartbeat, there is a problem to do with status reporting for KILLED/FAILED tasks. Assume, at a certain TaskTracker node, some task(s) fails just before sending the heartbeat. The tasktracker sends the status of those tasks. The tasktracker removes these from the runningTasks map after getting the heartbeat response, and won't report the statuses of those tasks again. The JobTracker will be unaware of such task failures..

Also, maybe, we should process the failed/killed tasks' statuses in the lightweight heartbeat as well. The logic being failed/killed tasks should be given the same treatment as virgin tasks. It actually makes sense to give higher priority to failed tasks during task assignment since if there is a deterministic failure on every attempt, the job would fail fast (after a certain number of attempts of the same task), leading to better cluster utilization..
  
> Jobtracker leaves tasktrackers underutilized
> --------------------------------------------
>
>                 Key: HADOOP-5632
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5632
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0, 0.18.1, 0.18.2, 0.18.3, 0.19.0, 0.19.1, 0.20.0
>         Environment: 2x HT 2.8GHz Intel Xeon, 3GB RAM, 4x 250GB HD linux boxes, 100 node cluster
>            Reporter: Khaled Elmeleegy
>         Attachments: hadoop-khaled-tasktracker.10s.uncompress.timeline.pdf, hadoop-khaled-tasktracker.150ms.uncompress.timeline.pdf, jobtracker.patch, jobtracker20.patch
>
>
> For some workloads, the jobtracker doesn't keep all the slots utilized even under heavy load.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.