You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Owen O'Malley (JIRA)" <ji...@apache.org> on 2011/08/19 01:00:28 UTC
[jira] [Commented] (MAPREDUCE-2846) a small % of all tasks fail
with DefaultTaskController
[ https://issues.apache.org/jira/browse/MAPREDUCE-2846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087374#comment-13087374 ]
Owen O'Malley commented on MAPREDUCE-2846:
------------------------------------------
Offline, Allen gave me a stack trace:
{quote}
java.io.FileNotFoundException: File /export/apps/hadoop/hadoop-0.20.204.0/logs/userlogs/job_201108100052_0008/attempt_201108100052_0008_r_000145_0/log.tmp does not exist.
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:371)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:210)
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:160)
at org.apache.hadoop.fs.RawLocalFileSystem.rename(RawLocalFileSystem.java:261)
at org.apache.hadoop.fs.ChecksumFileSystem.rename(ChecksumFileSystem.java:406)
at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:345)
at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:391)
at org.apache.hadoop.mapred.Child.main(Child.java:235)
{quote}
Based on this, I discovered that there is a missing synchronization in writeToIndexFile. This seems to reduce the failures that Allen is seeing.
> a small % of all tasks fail with DefaultTaskController
> ------------------------------------------------------
>
> Key: MAPREDUCE-2846
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2846
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: task, task-controller, tasktracker
> Affects Versions: 0.20.204.0
> Reporter: Allen Wittenauer
> Priority: Blocker
>
> After upgrading our test 0.20.203 grid to 0.20.204-rc2, we ran terasort to verify operation. While the job completed successfully, approx 10% of the tasks failed with task runner execution errors and the inability to create symlinks for attempt logs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira