You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Travis Woodruff <ap...@yahoo.com> on 2008/02/05 01:41:31 UTC

Possible memory "leak" in MapTask$MapOutputBuffer

I have been using Hadoop for a couple of months now, and I recently moved to an x86_64 platform. When I ran some jobs that I've run previously on the 32-bit cluster, I got OutOfMemoryError on a large number of map tasks. I initially chalked it up to 64-bit object overhead being a bit higher and increased my task process heap size from 512M to 650M. After increasing it, the OOMEs have decreased, but I'm still seeing them occasionally, so I did some poking around in a heap snapshot, and I think I've found a potential problem with the way the sort buffer is being cleaned up.

After MapOutputBuffer calls, sortAndSpillToDisk(), it iterates over all the sortImpls, and calls close(). This close nulls the keyValBuffer member of BasicTypeSorterBase; however, it does not clear the references in the sorter's comparator (WritableComparator.buffer). Because of this, I think it's possible for the old buffer (or even multiple old buffers) to not be GC'd. If one or more partiitions' sorters are used for sorting a buffer's contents but not for the next, the comparators for the sorters for the first set of partitions will hold a reference to the first buffer even after the new buffer is created.

Please let me know if you agree with this assessment. If this is indeed a problem it could (at least partially) explain some of the mysterious memory usage discussed in HADOOP-2751.


Thanks,
Travis




      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping