You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2008/09/16 10:45:44 UTC

[jira] Commented: (HADOOP-2141) speculative execution start up condition based on completion time

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12631301#action_12631301 ] 

Arun C Murthy commented on HADOOP-2141:
---------------------------------------

Looking through this patch, a few comments:

# JobInProgress.getSpeculative{Map|Reduce} are both called from synchronized methods i.e. JobInProgress.findNew{Map|Reduce}Task; hence please mark these as synchronized too, just to be future-proof.
# JobInProgress.findSpeculativeTask's 'shouldRemove' parameter is always passed in as 'false' (from getSpeculative{Map|Reduce}) ... do we even need this parameter?
# JobInProgress.isTaskSlowEnoughToSpeculate gets mapred.speculative.execution.slowTaskThreshold from the JobConf always - we should just cache that in a private variable. Ditto for JobInProgress.isSlowTracker/mapred.speculative.execution.slowNodeThreshold and JobInProgress.atSpeculativeCap/mapred.speculative.execution.speculativeCap. (Also please remove the LOG.info for the config variable in JobInProgress.isTaskSlowEnoughToSpeculate).
# JobInProgress.findSpeculativeTask gets a List of TIPs, it then proceeds to convert that to an TIP[] for JobInProgress.isSlowTracker etc. - we should just get all apis to work with List<TIP> and do away with that conversion.
# Can we keep a running count of 'progress' of TaskTrackers' tasks rather than recompute them each time in JobInProgress.isSlowTracker? For large jobs it might be significant...
# JobInProgress.isTaskSlowEnoughToSpeculate really bothers me. It is called from inside a loop (i.e. for each TIP) and it sorts the progress of each TIP. This is potentially very expensive. At the very least we should sort the the TIPs once and even better - we should maintain a PriorityQueue of TIPs based on their progress.
# I'm guessing that sorting 'candidate speculative tasks' in JobInProgress.findSpeculativeTask isn't prohibitively expensive since the number of candidates is fairly small, could you please confirm?
# Minor: Please adhere to the 80 character limit per-line.

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.19.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: HADOOP-2141-v2.patch, HADOOP-2141.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.