You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Ken Williams <zo...@hotmail.com> on 2011/06/18 18:39:09 UTC

RE: OutOfMemoryError: GC overhead limit exceeded

Hi All,

I'm having a problem running a job on Hadoop. Using Mahout, I've been able to run several Bayesian classifiers and train and test them successfully on increasingly largedatasets. Now I'm working on a dataset of 100,000 documents (size 100MB). I've been able to train the classifier but when I try to 'testclassifier' all the map tasks are failingwith a 'Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded' exception,before the job itself is 'Killed'. I have a small cluster of 3 machines but have plenty of memory and CPU power (3 x 16GB, 2.5GHz quad-core machines). I've tried setting 'mapred.child.java.opts' flags up to 3GB in size (-Xms3G -Xmx3G) but still get the same error. I've also tried setting HADOOP_HEAPSIZE at values up to 3000 but this made no difference. When the program is running I can use 'top' to see that although the CPUs are busy, memory usage rarely goes above 12GB and absolutely no swapping is taking place.
I saw the same exception where a program was spending so much time garbage-collecting (more then 90% of its time!) that the program was unable to progress and so threw the 'GC overhead limit exceeded' exception.  If I set (-XX:UseGCOverheadLimit) in the mapred.child.java.opts property then I see the same behaviour as before only a slightly different exception is thrown, 'Caused by: java.lang.OutOfMemoryError: Java heap space at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:39)' I'm guessing my program is spending too much time garbage-collecting for it to progress, but how do I fix this ? I'm usingHadoop 0.20.2 and the latest Mahout snapshot version. All machines are running 64-bit Ubuntu and Java 6. Any help would be very much appreciated,
Ken