You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Renyi Xiong <re...@gmail.com> on 2016/05/15 18:46:33 UTC
Spark shuffling OutOfMemoryError Java heap space

Hi

I am consistently observing driver OutOfMemoryError (Java heap space)
during shuffling operation indicated by the log:

…………

16/05/14 21:57:03 INFO MapOutputTrackerMaster: Size of output statuses for
shuffle 2 is 36060250 bytes à shuffle metadata size is big and the full
metadata will be sent to all workers?

16/05/14 21:57:06 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 2 to <host1>:45757

16/05/14 21:57:06 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 2 to <host2>:20300

16/05/14 21:57:06 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 2 to <host3>:12389

16/05/14 21:57:06 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 2 to <host4>:32197

…………

Exception in thread "dispatcher-event-loop-17" Exception in thread
"dispatcher-event-loop-3" Exception in thread "dispatcher-event-loop-6"
16/05/14 21:59:04 INFO MapOutputTrackerMasterEndpoint: Asked to send map
output locations for shuffle 2 to <host5>:19639

Exception in thread "dispatcher-event-loop-21" 16/05/14 21:59:08 INFO
MapOutputTrackerMasterEndpoint: Asked to send map output locations for
shuffle 2 to <host6>:58461

Exception in thread "dispatcher-event-loop-20" Exception in thread
"dispatcher-event-loop-13" Exception in thread
"dispatcher-event-loop-9" java.lang.OutOfMemoryError:
Java heap space

java.lang.OutOfMemoryError: Java heap space

                at java.util.Arrays.copyOf(Arrays.java:2271)

                at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:178)

                at org.apache.spark.serializer.
JavaSerializerInstance.serialize(JavaSerializer.scala:103) à shuffle
metadata duplicated (?) when sending to each executor?

                at
org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:252)

                at
org.apache.spark.rpc.netty.RemoteNettyRpcCallContext.send(NettyRpcCallContext.scala:64)

                at
org.apache.spark.rpc.netty.NettyRpcCallContext.reply(NettyRpcCallContext.scala:32)

                at
org.apache.spark.MapOutputTrackerMasterEndpoint$$anonfun$receiveAndReply$1.applyOrElse(MapOutputTracker.scala:62)

                at
org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:104)

                at
org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:204)

                at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:100)

                at
org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)

                at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

                at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
                at java.lang.Thread.run(Thread.java:724)

I enable memory dump and used jhat to analyze. In heap histogram, I found
146 byte array objects with exact same size of 36,060,293 bytes.

I wonder if the 146 big objects are actually duplicates of the same shuffle
metadata, *can experts please help understand if it's true?*

(8G driver memory was specified for the above run, should be sufficient for
the 36M shuffle metadata. but probably not for 146 duplicates)

thanks,
Renyi.