You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by rcollich <rc...@gmail.com> on 2016/01/30 01:53:24 UTC

Garbage collections issue on MapPartitions

Hi all,

I currently have a mapPartitions job which is flatMapping each value in the
iterator, and I'm running into an issue where there will be major GC costs
on certain executions. Some executors will take 20 minutes, 15 of which are
pure garbage collection, and I believe that a lot of it has to do with the
ArrayBuffer that I am outputting. Does anyone have any suggestions as to how
I can do some form of a stream output?

Also, does anyone have any advice in general for tracking down/addressing GC
issues in spark?



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-collections-issue-on-MapPartitions-tp26104.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org