You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/09/08 21:03:44 UTC

[jira] Commented: (HADOOP-4115) Reducer gets stuck in shuffle when local disk out of space

    [ https://issues.apache.org/jira/browse/HADOOP-4115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12629259#action_12629259 ] 

Devaraj Das commented on HADOOP-4115:
-------------------------------------

Ideally, the task should have exited when this exception was thrown. By any chance, did you get a jstack dump of the task when it hung after the exception? (kill -3 <pid> would also help). At this point of time, one suspect is some thread is not a daemon thread and it is preventing the process from exiting. The other suspect is that the task JVM is stuck for some reason in the finally clause of TaskTracker.Child.main(). Do you know whether the TT was reachable and whether it logged the task failure?

> Reducer gets stuck in shuffle when local disk out of space
> ----------------------------------------------------------
>
>                 Key: HADOOP-4115
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4115
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.2
>            Reporter: Marco Nicosia
>            Priority: Critical
>
> 2008-08-29 23:53:12,357 WARN org.apache.hadoop.mapred.ReduceTask: task_200808291851_0001_r_000245_0 Merging of the local FS files threw an exception: org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:199)
> 	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
> 	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:339)
> 	at org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
> 	at org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
> 	at org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
> 	at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
> 	at org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:47)
> 	at java.io.DataOutputStream.write(DataOutputStream.java:90)
> 	at org.apache.hadoop.io.SequenceFile$UncompressedBytes.writeUncompressedBytes(SequenceFile.java:617)
> 	at org.apache.hadoop.io.SequenceFile$Writer.appendRaw(SequenceFile.java:1038)
> 	at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2626)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$LocalFSMerger.run(ReduceTask.java:1564)
> Caused by: java.io.IOException: No space left on device
> 	at java.io.FileOutputStream.writeBytes(Native Method)
> 	at java.io.FileOutputStream.write(FileOutputStream.java:260)
> 	at org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:197)
> 	... 16 more
> 2008-08-29 23:53:14,013 WARN org.apache.hadoop.mapred.TaskTracker: Error running child
> java.io.IOException: task_200808291851_0001_r_000245_0The reduce copier failed
> 	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:329)
> 	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2122)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.