You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Wang, Chengwei" <wa...@gatech.edu> on 2010/11/12 18:16:19 UTC

Questions about progress score and speculative execution

Thanks Rekha, it is really helpful!

Could you, or anybody, please also help me understand following questions?

1. How could I get the progress score of each task (map or reduce). Can I have them from the log files, directly or by configuring them to "debug" mode or I need to change the source of Hadoop? 

2. For speculative execution, hadoop looks at the average progress score of map tasks( or of reduce tasks ) and compare a task's progress score with the average. If it is less than the average - 0.2, the task is a straggler. For example, if there are 10 map tasks, we first compute the average progress score of the 10 map tasks, then we compare each of the 10 map tasks to the average to find the straggler. Am I right on the algorithm? Please do correct me if I am wrong.

Thanks a lot!
Regards
Chengwei     

----- Original Message -----
From: "Rekha Joshi" <re...@yahoo-inc.com>
To: common-dev@hadoop.apache.org
Sent: Friday, November 12, 2010 12:53:25 AM
Subject: Re: about the task statistics in the history directory

Hi Chengwei,

If it helps, reading the hadoop tutorial, the configuration files along with API JobHistory* pages would provide you the main details.
For eg: http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.MapAttempt.html
http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.Keys.html

There is a typo on api - http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory
"JobHistory.ReduceAttempt <http://hadoop.apache.org/common/docs/r0.20.2/api/org/apache/hadoop/mapred/JobHistory.ReduceAttempt.html>
          Helper class for logging or reading back events related to start, finish or failure of  a Map Attempt on a node."

It should be "Reduce" instead of "Map".Use your judgment. :)

Just an example that only code is gospel truth, api/document are guiding force.

Thanks & Regards,
/Rekha.

On 11/12/10 7:57 AM, "Wang, Chengwei" <wa...@gatech.edu> wrote:

HI All,

I just wonder if there is any doc explaining the terms in the task statistics in the logs/history/ ? For example 'SPLITS', 'MapAttempt'?

Thanks a lot for enlightening.

Regards
Chengwei