You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Andy Konwinski (JIRA)" <ji...@apache.org> on 2009/05/15 08:14:45 UTC
[jira] Issue Comment Edited: (HADOOP-2141) speculative execution start up condition based on completion time

    [ https://issues.apache.org/jira/browse/HADOOP-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709729#action_12709729 ] 

Andy Konwinski edited comment on HADOOP-2141 at 5/14/09 11:13 PM:
------------------------------------------------------------------

Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed the mostRecentStartTime and am now only using dispatchTime. It is now updated in TaskInProgress.getTaskToRun(), not JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we need to think about this a bit more, but I want to get this patch submitted with the changes and bug fixes I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task tracker hasn't successfully completed a task for this job yet. In the last patch (v8) I was just assuming that it was a laggard in such cases to be safe. Now I am checking if the TT has been assigned a task for this job or not yet. If it hasn't then we give it the benefit of the doubt, if it has been assigned a task but hasn't finished the task yet then we don't speculate on it. This should address the case Deveraj pointed out earlier of running in a cluster that has more nodes than we have tasks or adding at task tracker during the middle of a long job. It might make more sense to just assume that nodes who haven't reported back progress (regardless if they have been assigned a task for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. My current implementation of DataStatistics.std() calculates the variance, not the standard deviation. I should have been using the square root of my formula. Also, I was considering trackers with faster tasks to be the laggards, it should obviously be trackers with slower tasks that are considered the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 seconds to run each map. Assuming these figures, let's compute mapTaskStats.mean() and mapTaskStats.std(), and, TT1.mean()/std(). Now if you assume that TT1 comes asking for a task, TT1 will be declared as slow. That should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. MapTaskStats.std() would be 0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The check in isSlowTracker is would evaluate to true since (1 < (1.5 - 0.25))  (assuming slowNodeThreshold is 1). This is obviously wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean)^(1/2) = (15/6 - 1.5*1.5) ^(1/2) = (0.25)^(1/2) = (0.5)

Now since we are using the default one standard deviation, we expect that no more than 1/2 of the tasks will be considered slow. This is shown by the One-sided Chebyshev inequality (http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - mapTaskStats.mean > maptaskStats.std * slowNodeThreshold).

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) == (-0.5 > 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) == (0.5 > 0.5) == false

      was (Author: andyk):
    Responding to Devaraj's comments:

re 1) You are right, they were redundant as far as I can tell. I have removed the mostRecentStartTime and am now only using dispatchTime. It is now updated in TaskInProgress.getTaskToRun(), not JobTracker.assignTasks().
 
re 2) Devaraj, what you are saying makes sense about locality, and I think we need to think about this a bit more, but I want to get this patch submitted with the changes and bug fixes I have done now.

Also, some other comments:

A)  I have updated isSlowTracker() to better handle the case where a task tracker hasn't successfully completed a task for this job yet. In the last patch (v8) I was just assuming that it was a laggard in such cases to be safe. Now I am checking if the TT has been assigned a task for this job or not yet. If it hasn't then we give it the benefit of the doubt, if it has been assigned a task but hasn't finished the task yet then we don't speculate on it. This should address the case Deveraj pointed out earlier of running in a cluster that has more nodes than we have tasks or adding at task tracker during the middle of a long job. It might make more sense to just assume that nodes who haven't reported back progress (regardless if they have been assigned a task for this job or not) are not laggards.

B) Finally, Devaraj caught two very serious bugs in my math in isSlowTracker. My current implementation of DataStatistics.std() calculates the variance, not the standard deviation. I should have been using the square root of my formula. Also, I was considering trackers with faster tasks to be the laggards, it should obviously be trackers with slower tasks that are considered the laggards.

Walking through an example (given by Devaraj):

2 trackers runs 3 maps each. TT1 takes 1 second to run each map. TT2 takes 2 seconds to run each map. Assuming these figures, let's compute mapTaskStats.mean() and mapTaskStats.std(), and, TT1.mean()/std(). Now if you assume that TT1 comes asking for a task, TT1 will be declared as slow. That should not happen.

The mapTaskStats.mean() would be 1.5 at the end of the 6 tasks. MapTaskStats.std() would be 0.25 (2.5 - 1.5*1.5). TT1's mean() would be 1. The check in isSlowTracker is would evaluate to true since (1 < (1.5 - 0.25))  (assuming slowNodeThreshold is 1). This is obviously wrong.
--

After fixing the bugs, for the numbers above, neither tracker would be considered a laggard:

mapTaskStats.mean() = (1+1+1+2+2+2)/6 = 1.5

mapTaskStats.sumSquares = (1^2 + 1^2 + 1^2 + 2^2 + 2^2 + 2^2) = 15
mapTaskStats.std() =  (sumSquares/6 - mean*mean)^(1/2) = (15/6 - 1.5*1.5)^(1/2) = (0.25)^(1/2) = (0.5)

Now since we are using the default one standard deviation, we expect that no more than 1/2 of the tasks will be considered slow. This is shown by the One-sided Chebyshev inequality (http://en.wikipedia.org/w/index.php?title=Chebyshev%27s_inequality#Variant:_One-sided_Chebyshev_inequality)

Now, we consider a task tracker to be slow if (tracker's task mean - mapTaskStats.mean > maptaskStats.std * <, which is 1 by default>).  Assuming the default 

* for TT1: (tt1.mean - mapTaskStats.mean > mapTaskStats.std) == (1 - 1.5 > 0.5) == (-0.5 > 0.5) == false
* for TT2: (tt2.mean - mapTaskStats.mean > mapTaskStats.std) == (2 - 1.5 > 0.5) == (0.5 > 0.5) == false
  
> speculative execution start up condition based on completion time
> -----------------------------------------------------------------
>
>                 Key: HADOOP-2141
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2141
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.21.0
>            Reporter: Koji Noguchi
>            Assignee: Andy Konwinski
>         Attachments: 2141.patch, HADOOP-2141-v2.patch, HADOOP-2141-v3.patch, HADOOP-2141-v4.patch, HADOOP-2141-v5.patch, HADOOP-2141-v6.patch, HADOOP-2141.patch, HADOOP-2141.v7.patch, HADOOP-2141.v8.patch
>
>
> We had one job with speculative execution hang.
> 4 reduce tasks were stuck with 95% completion because of a bad disk. 
> Devaraj pointed out 
> bq . One of the conditions that must be met for launching a speculative instance of a task is that it must be at least 20% behind the average progress, and this is not true here.
> It would be nice if speculative execution also starts up when tasks stop making progress.
> Devaraj suggested 
> bq. Maybe, we should introduce a condition for average completion time for tasks in the speculative execution check. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.