You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by Radim Kolar <hs...@filez.com> on 2013/05/06 21:46:33 UTC

oom during shuffle in 2.0.4

Dne 6.5.2013 17:09, Jason Lowe napsal(a):
> This may be related to MAPREDUCE-5168 
> <https://issues.apache.org/jira/browse/MAPREDUCE-5168>. There's a 
> memory leak of sorts in the shuffle if many map outputs end up being 
> merged from disk.
my case is different. In memory merge of 281 MB. These are first 4 
segments, and jvm limit is set to 600MB, 
mapreduce.reduce.shuffle.input.buffer.percent=0.7, computed limit 391MB.
Used memory  354733233 and it tries to fetch one more map output and OOMs.

<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xms200m -Xmx600m</value>
</property>

There might be problem with JVM memory management. If you allocate large 
chunks of memory on heap you can still get OOM with enough free memory 
due to fragmentation. In cassandra, they are allocating memory from jvm 
heap in 64K segments.

[main] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: MergerManager: memoryLimit=391498944, maxSingleShuffleLimit=97874736, mergeThreshold=258389312, ioSortFactor=20, memToMemMergeOutputsThreshold=20
2013-05-06 18:06:46,973 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 70567974 bytes from map-output for attempt_1367849990671_0001_m_000000_0
2013-05-06 18:06:46,973 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 70567974, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->70567974
2013-05-06 18:06:47,004 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1367849990671_0001_m_000001_0 decomp: 70361922 len: 15380635 to MEMORY
2013-05-06 18:06:47,477 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 70361922 bytes from map-output for attempt_1367849990671_0001_m_000001_0
2013-05-06 18:06:47,477 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 70361922, inMemoryMapOutputs.size() -> 2, commitMemory -> 70567974, usedMemory ->140929896
2013-05-06 18:06:47,504 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1367849990671_0001_m_000002_0 decomp: 70556960 len: 15449457 to MEMORY
2013-05-06 18:06:48,067 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 70556960 bytes from map-output for attempt_1367849990671_0001_m_000002_0
2013-05-06 18:06:48,067 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 70556960, inMemoryMapOutputs.size() -> 3, commitMemory -> 140929896, usedMemory ->211486856
2013-05-06 18:06:48,094 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1367849990671_0001_m_000003_0 decomp: 69922463 len: 15325476 to MEMORY
2013-05-06 18:06:48,558 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 69922463 bytes from map-output for attempt_1367849990671_0001_m_000003_0
2013-05-06 18:06:48,558 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 69922463, inMemoryMapOutputs.size() -> 4, commitMemory -> 211486856, usedMemory ->281409319
2013-05-06 18:06:48,558 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Starting inMemoryMerger's merge since commitMemory=281409319 > mergeThreshold=258389312. Current usedMemory=281409319
2013-05-06 18:06:48,559 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeThread: InMemoryMerger - Thread to merge in-memory shuffled map-outputs: Starting merge with 4 segments, while ignoring 0 segments
2013-05-06 18:06:48,587 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#2 about to shuffle output of map attempt_1367849990671_0001_m_000005_0 decomp: 73323914 len: 15761255 to MEMORY
2013-05-06 18:06:48,696 INFO [InMemoryMerger - Thread to merge in-memory shuffled map-outputs] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: Initiating in-memory merge with 4 segments...
2013-05-06 18:06:48,698 INFO [InMemoryMerger - Thread to merge in-memory shuffled map-outputs] org.apache.hadoop.mapred.Merger: Merging 4 sorted segments
2013-05-06 18:06:48,699 INFO [InMemoryMerger - Thread to merge in-memory shuffled map-outputs] org.apache.hadoop.mapred.Merger: Down to the last merge-pass, with 4 segments left of total size: 281409123 bytes
2013-05-06 18:06:49,030 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 73323914 bytes from map-output for attempt_1367849990671_0001_m_000005_0
2013-05-06 18:06:49,030 INFO [fetcher#2] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 73323914, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->354733233
2013-05-06 18:06:49,187 ERROR [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:crawler (auth:SIMPLE) cause:org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
2013-05-06 18:06:49,188 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
	at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:121)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:379)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)
Caused by: java.lang.OutOfMemoryError: Java heap space