You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2013/02/26 04:56:12 UTC
[jira] [Created] (MAPREDUCE-5028) Maps fail when io.sort.mb is set to high value

Karthik Kambatla created MAPREDUCE-5028:
-------------------------------------------

             Summary: Maps fail when io.sort.mb is set to high value
                 Key: MAPREDUCE-5028
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5028
             Project: Hadoop Map/Reduce
          Issue Type: Bug
    Affects Versions: 0.23.5, 2.0.3-alpha, 1.1.1
            Reporter: Karthik Kambatla
            Assignee: Karthik Kambatla
            Priority: Critical


Verified the problem exists on branch-1 with the following configuration:

Pseudo-dist mode: 2 maps/ 1 reduce, mapred.child.java.opts=-Xmx2048m, io.sort.mb=1280, dfs.block.size=2147483648

Run teragen to generate 4 GB data
Maps fail when you run wordcount on this configuration with the following error: 
{noformat}
java.io.IOException: Spill failed
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1031)
	at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:692)
	at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:45)
	at org.apache.hadoop.examples.WordCount$TokenizerMapper.map(WordCount.java:34)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:766)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.io.EOFException
	at java.io.DataInputStream.readInt(DataInputStream.java:375)
	at org.apache.hadoop.io.IntWritable.readFields(IntWritable.java:38)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
	at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
	at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
	at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
	at org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1505)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1438)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:855)
	at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1346)
{noformat}

Marked branch-0.23 and branch-2 also because the offending code seems to exist there too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira