You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andy Konwinski (JIRA)" <ji...@apache.org> on 2009/03/27 02:50:51 UTC

[jira] Updated: (HADOOP-2141) speculative execution start up condition based on completion time

     [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andy Konwinski updated HADOOP-2141:
-----------------------------------

    Affects Version/s:     (was: 0.19.0)
                       0.21.0
               Status: Patch Available  (was: Open)

Responding to Devaraj's comments:

"The field TaskInProgress.mostRecentStartTime is updated with the same value of execStartTime each time (since execStartTime is updated only once in the life of the TIP). Did you mean to do this?"

No, good catch. mostRecentStartTime should be updated with the current time each time getTaskToRun is called. I have made this change.

"They should be decremented in TIP.incompleteSubTask and TIP.completedTask (basically, places where activeTasks.remove) is done. The decrement should happen if activeTasks.size for the TIP is >1. Makes sense?"

Thanks to Devaraj for writing the decrementSpeculativeCount() function, which is called from failedTask() and completedTask(). I have replaced the countSpeculating() function call in atSpeculativeCap() with the sum of speculativeMapTasks+speculativeReduceTasks. 

"Couldn't it be checked whether TIP.isComplete() returns true before launching a speculative attempt?"

Yes, I think this could be done as an optimization. It would add a little bit of complexity though and before making too many more changes maybe it would be good to test the current functionality. Again, it would be nice if we could get a few people to test the performance impact of this patch at scale.

> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.