You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Joydeep Sen Sarma <js...@facebook.com> on 2007/10/21 04:27:09 UTC

repeated reduce task timeouts (false alarms)

Running 0.13.1 - running into this very predictably (some tasks seem to
keep timing out). The pattern is like this:

 

-          tasktracker says reduce task is not responding:

 

2007-10-20 18:40:28,225 INFO org.apache.hadoop.mapred.TaskTracker:
task_0006_r_000000_38 0.0% reduce > copy >             

2007-10-20 18:50:36,772 INFO org.apache.hadoop.mapred.TaskTracker:
task_0006_r_000000_38: Task failed to report status for 608 seconds.
Killing.  

 

-          but reduce task is chugging away:

2007-10-20 18:46:18,070 INFO org.apache.hadoop.mapred.ReduceTask:
task_0006_r_000000_38 Copying task_0006_m_000003_0 output from
hadoop037.sf2p.facebook.com.


2007-10-20 18:46:28,235 INFO org.apache.hadoop.mapred.ReduceTask:
task_0006_r_000000_38 done copying task_0006_m_000007_0 output from
hadoop021.sf2p.facebook.com.

 

>From the timestamps - the reduce task seems working away happily when
the tasktracker times it out?

 

Is there a relevant patch I should apply? Help appreciated - this is
wreaking havoc ..

 

Thx,

 

Joydeep