You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Eli Collins (Commented) (JIRA)" <ji...@apache.org> on 2011/11/05 02:47:51 UTC

[jira] [Commented] (MAPREDUCE-2960) A single TT disk failure can cause the job to fail

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13144513#comment-13144513 ] 

Eli Collins commented on MAPREDUCE-2960:
----------------------------------------

This affects the LTC as well. The log claims the job completed, however it was a wordcount job and it never populated the output directory.

{noformat}
11/11/03 12:51:56 WARN mapred.JobClient: Error reading task outputhttp://eli-thi
nkpad:50316/tasklog?plaintext=true&attemptid=attempt_201111031152_0006_m_000001_
1&filter=stdout
11/11/03 12:51:56 WARN mapred.JobClient: Error reading task outputhttp://eli-thi
nkpad:50316/tasklog?plaintext=true&attemptid=attempt_201111031152_0006_m_000001_
1&filter=stderr
11/11/03 12:51:59 INFO mapred.JobClient: Task Id : attempt_201111031152_0006_m_0
00001_2, Status : FAILED
Error initializing attempt_201111031152_0006_m_000001_2:
java.io.IOException: Job initialization failed (255)
        at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskC
ontroller.java:192)
        at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1231)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1059)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1
206)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:112
1)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2410)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.util.Shell$ExitCodeException: 
        at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
        at org.apache.hadoop.util.Shell.run(Shell.java:182)
        at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:
375)
        at org.apache.hadoop.mapred.LinuxTaskController.initializeJob(LinuxTaskC
ontroller.java:185)
        ... 8 more

11/11/03 12:51:59 WARN mapred.JobClient: Error reading task outputhttp://eli-thinkpad:50316/tasklog?plaintext=true&attemptid=attempt_201111031152_0006_m_000001_2&filter=stdout
11/11/03 12:51:59 WARN mapred.JobClient: Error reading task outputhttp://eli-thinkpad:50316/tasklog?plaintext=true&attemptid=attempt_201111031152_0006_m_000001_2&filter=stderr
11/11/03 12:52:02 INFO mapred.JobClient: Job complete: job_201111031152_0006
11/11/03 12:52:02 INFO mapred.JobClient: Counters: 4
11/11/03 12:52:02 INFO mapred.JobClient:   Job Counters 
11/11/03 12:52:02 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=0
11/11/03 12:52:02 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
11/11/03 12:52:02 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
11/11/03 12:52:02 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
{noformat}
                
> A single TT disk failure can cause the job to fail
> --------------------------------------------------
>
>                 Key: MAPREDUCE-2960
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2960
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: tasktracker
>    Affects Versions: 0.20.204.0
>            Reporter: Eli Collins
>
> TaskInProgress#kill in the JT fails because TaskStatus#setFinishTimes fails because no start time was set. There's no start time because TaskTracker#run (DefaultTaskController#initializeJob) failed before it was set. The fix is to have TT#launchTask set the start time before it starts the task runner, this way there's a valid start time even if TT#run fails.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira