You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by Thushara Wijeratna <th...@gmail.com> on 2009/05/29 18:29:13 UTC

debugging task timeouts on 0.19.1

how do i debug a job being killed this way? :

2009-05-29 01:28:56,672 INFO org.apache.hadoop.mapred.TaskInProgress: Error
from attempt_200905281652_0006_m_000007_2: Task
attempt_200905281652_0006_m_000007_2 failed to report status for 603
seconds. Killing!
2009-05-29 01:28:56,673 INFO org.apache.hadoop.mapred.JobTracker: Adding
task 'attempt_200905281652_0006_m_000007_3' to tip
task_200905281652_0006_m_000007, for tracker
'tracker_domU-12-31-38-01-79-93.compute-1.internal:localhost.localdomain/
127.0.0.1:33837'
2009-05-29 01:28:56,673 INFO org.apache.hadoop.mapred.JobInProgress:
Choosing data-local task task_200905281652_0006_m_0000072009-05-29
01:28:56,673 INFO org.apache.hadoop.mapred.JobTracker: Removed completed
task 'attempt_200905281652_0006_m_000007_2' from 'tracker_domU-12-3
1-38-01-79-93.compute-1.internal:localhost.localdomain/127.0.0.1:33837'2009-05-29
01:39:15,008 INFO org.apache.hadoop.mapred.TaskInProgress: Error from
attempt_200905281652_0006_m_000007_3: Task attempt_200905281652_0006_m
_000007_3 failed to report status for 603 seconds. Killing!2009-05-29
01:39:15,008 INFO org.apache.hadoop.mapred.TaskInProgress: TaskInProgress
task_200905281652_0006_m_000007 has failed 4 times.
2009-05-29 01:39:15,009 INFO org.apache.hadoop.mapred.JobInProgress:
Aborting job job_200905281652_00062009-05-29 01:39:15,009 INFO
org.apache.hadoop.mapred.JobInProgress: Killing job 'job_200905281652_0006'

this is a pass/through map/reduce job - map/reduce code doesn't do anything
except report status via Conters, like:

reporter.incrCounter(Counters.MAP_RECORDS, 1);

thanks,
thushara