You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Yan Fang <ya...@gmail.com> on 2014/09/09 01:37:36 UTC

[Spark Streaming] java.lang.OutOfMemoryError: GC overhead limit exceeded

Hi guys,

My Spark Streaming application have this "java.lang.OutOfMemoryError: GC
overhead limit exceeded" error in SparkStreaming driver program. I have
done the following to debug with it:

1. improved the driver memory from 1GB to 2GB, this error came after 22
hrs. When the memory was 1GB, it came after 10 hrs. So I think it is the
memory leak problem.

2. after starting the application a few hours, I killed all workers. The
driver program kept running and also filling up the memory. I was thinking
it was because too many batches in the queue, obviously it is not.
Otherwise, after killing workers (of course, the receiver), the memory
usage should have gone down.

3. run the heap dump and Leak Suspect of Memory Analysis in Eclipse, found
that

*"One instance of "org.apache.spark.storage.BlockManager" loaded by
"sun.misc.Launcher$AppClassLoader @ 0x6c002fb90" occupies 1,477,177,296
(72.70%) bytes. The memory is accumulated in one instance of
"java.util.LinkedHashMap" loaded by "<system class loader>".*

*Keywords*
*sun.misc.Launcher$AppClassLoader @ 0x6c002fb90**java.util.LinkedHashMap*
*org.apache.spark.storage.BlockManager "*



What my application mainly does is :

1. calculate the sum/count in a batch
2. get the average in the batch
3. store the result in DB

4. calculate the sum/count in a window
5. get the average/min/max in the window
6. store the result in DB

7. compare the current batch value with previous batch value using
updateStateByKey.


Any hint what causes this leakage? Thank you.

Cheers,

Fang, Yan
yanfang724@gmail.com
+1 (206) 849-4108