You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Runping Qi (JIRA)" <ji...@apache.org> on 2008/04/11 15:52:06 UTC

[jira] Created: (HADOOP-3236) Reduce task failed at shuffling time, throwing null pointer exception

Reduce task failed at shuffling time, throwing null pointer exception
---------------------------------------------------------------------

                 Key: HADOOP-3236
                 URL: https://issues.apache.org/jira/browse/HADOOP-3236
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
            Reporter: Runping Qi




This happened for 0.17.0 branch.

Here is the stack trace:

2008-04-11 13:45:54,171 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure: java.lang.NullPointerException
	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:302)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:242)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:853)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3236) Reduce task failed at shuffling time, throwing null pointer exception

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588089#action_12588089 ] 

Devaraj Das commented on HADOOP-3236:
-------------------------------------

Actually, whenever there is an error in the merge, the InMemoryFileSystem is left in a state where other threads using the InMemoryFileSystem would encounter NPEs. This needs to be fixed. But it's not a correctness issue since if there is a merge exception the task will die for sure.

So yes the NPEs are due to the other merge exceptions as you pointed out. But I am not sure why so many reducers are seeing corrupted map outputs.

> Reduce task failed at shuffling time, throwing null pointer exception
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3236
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3236
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> This happened for 0.17.0 branch.
> Here is the stack trace:
> 2008-04-11 13:45:54,171 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure: java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:302)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:242)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:853)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3236) Reduce task failed at shuffling time, throwing null pointer exception

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-3236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12588083#action_12588083 ] 

Runping Qi commented on HADOOP-3236:
------------------------------------


It seems there is some bug in the un-mem merge code.
I saw some other tasks throw different exceptions:

2008-04-11 16:37:43,978 WARN org.apache.hadoop.mapred.ReduceTask: task_200804111608_0001_r_000592_0 Intermediate Merge of the inmemory files threw an exception: java.lang.NegativeArraySizeException
	at org.apache.hadoop.io.SequenceFile$UncompressedBytes.reset(SequenceFile.java:604)
	at org.apache.hadoop.io.SequenceFile$UncompressedBytes.access$900(SequenceFile.java:594)
	at org.apache.hadoop.io.SequenceFile$Reader.nextRawValue(SequenceFile.java:2022)
	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawValue(SequenceFile.java:3014)
	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2758)
	at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2625)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1642)
 
The weird thing is that the map output compression flag for the job was set to false,
The merger must have encountered  corrupted data.

Here is another exception from another task:

2008-04-11 17:22:17,863 WARN org.apache.hadoop.mapred.ReduceTask: task_200804111608_0001_r_000646_0 Intermediate Merge of the inmemory files threw an exception: java.io.IOException: java.io.IOException: /taskTracker/jobcache/job_200804111608_0001/task_200804111608_0001_r_000646_0/output/map_36648.out not a SequenceFile
	at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1479)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1442)
	at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1363)
	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:2985)
	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.merge(SequenceFile.java:2802)
	at org.apache.hadoop.io.SequenceFile$Sorter.merge(SequenceFile.java:2556)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1633)

	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1640)


Another different one:

2008-04-11 16:46:08,652 WARN org.apache.hadoop.mapred.ReduceTask: task_200804111608_0001_r_000881_0 Intermediate Merge of the inmemory files threw an exception: java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:52)
	at org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:90)
	at org.apache.hadoop.io.SequenceFile$Reader.nextRawKey(SequenceFile.java:1974)
	at org.apache.hadoop.io.SequenceFile$Sorter$SegmentDescriptor.nextRawKey(SequenceFile.java:3002)
	at org.apache.hadoop.io.SequenceFile$Sorter$MergeQueue.next(SequenceFile.java:2760)
	at org.apache.hadoop.io.SequenceFile$Sorter.writeFile(SequenceFile.java:2625)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:1642)


> Reduce task failed at shuffling time, throwing null pointer exception
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-3236
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3236
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> This happened for 0.17.0 branch.
> Here is the stack trace:
> 2008-04-11 13:45:54,171 ERROR org.apache.hadoop.mapred.ReduceTask: Map output copy failure: java.lang.NullPointerException
> 	at org.apache.hadoop.fs.InMemoryFileSystem$RawInMemoryFileSystem.getFileStatus(InMemoryFileSystem.java:302)
> 	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:242)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:853)
> 	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:777)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.