You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2006/04/20 19:33:05 UTC

[jira] Created: (HADOOP-152) Speculative tasks not being scheduled

Speculative tasks not being scheduled
-------------------------------------

         Key: HADOOP-152
         URL: http://issues.apache.org/jira/browse/HADOOP-152
     Project: Hadoop
        Type: Bug

  Components: mapred  
    Versions: 0.2    
 Environment: ~30 node Opteron cluster
    Reporter: Bryan Pendleton
    Priority: Minor


The criteria for starting up a speculative task includes a check that the "average progress"-"progress" > the speculative gap, currently 0.2.

I don't know if this is the right metric, but it doesn't seem to be correctly calculated. I've regularly seen the "average progress" with values of less than 0.01, while the "progress" value is showing in the range .90-1.0, and yet, still no speculative tasks are started up. This has caused at least one long-running task to run about 10% longer while overloaded hosts catch up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Commented: (HADOOP-152) Speculative tasks not being scheduled

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/HADOOP-152?page=comments#action_12383306 ] 

Bryan Pendleton commented on HADOOP-152:
----------------------------------------

*bump*

Is anyone else seeing this problem? My cluster is pretty unevenly loaded, and, without speculative execution, I'm waiting for very long times for tasks to timeout on short jobs. Speculative execution is enabled, so there's no reason that, say, two maps out of ~1900 should be holding up execution. I suspect the "progress" accounting being done in the Job isn't being done correctly.

But, even then, perhaps we need more metrics - with the current metrics, if one of the job units happens to be running really slowly on a given node, but might run faster on other nodes, it might never get executed on another node because the progress on the slow node might be reported as close enough to done so as to not trip the speculative execution.

> Speculative tasks not being scheduled
> -------------------------------------
>
>          Key: HADOOP-152
>          URL: http://issues.apache.org/jira/browse/HADOOP-152
>      Project: Hadoop
>         Type: Bug

>   Components: mapred
>     Versions: 0.2
>  Environment: ~30 node Opteron cluster
>     Reporter: Bryan Pendleton
>     Priority: Minor

>
> The criteria for starting up a speculative task includes a check that the "average progress"-"progress" > the speculative gap, currently 0.2.
> I don't know if this is the right metric, but it doesn't seem to be correctly calculated. I've regularly seen the "average progress" with values of less than 0.01, while the "progress" value is showing in the range .90-1.0, and yet, still no speculative tasks are started up. This has caused at least one long-running task to run about 10% longer while overloaded hosts catch up.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-152) Speculative tasks not being scheduled

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HADOOP-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-152.
----------------------------------

    Resolution: Duplicate
      Assignee:     (was: Owen O'Malley)

This was fixed when we fixed the thresholds for launching speculative tasks as part of HADOOP-76.

> Speculative tasks not being scheduled
> -------------------------------------
>
>                 Key: HADOOP-152
>                 URL: https://issues.apache.org/jira/browse/HADOOP-152
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.2.0
>         Environment: ~30 node Opteron cluster
>            Reporter: Bryan Pendleton
>            Priority: Minor
>
> The criteria for starting up a speculative task includes a check that the "average progress"-"progress" > the speculative gap, currently 0.2.
> I don't know if this is the right metric, but it doesn't seem to be correctly calculated. I've regularly seen the "average progress" with values of less than 0.01, while the "progress" value is showing in the range .90-1.0, and yet, still no speculative tasks are started up. This has caused at least one long-running task to run about 10% longer while overloaded hosts catch up.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.