You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by "dan@lumity.com" <da...@lumity.com> on 2015/08/25 03:59:02 UTC

ExternalSorter: Thread *** spilling in-memory map of 352.6 MB to disk (38 times so far)

Hi all,

I posted this on the user list but didn't get any responses to I was hoping
one of you awesome devs could shed some light:


I'm trying to run a spark 1.5 job with: 

 ./spark-shell --driver-java-options "-Xdebug
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044 -Xms16g
-Xmx48g -Xss128m"   

I get lots of error messages like : 

15/08/24 20:24:33 INFO ExternalSorter: Thread 172 spilling in-memory map of
352.2 MB to disk (40 times so far) 
15/08/24 20:24:33 INFO ExternalSorter: Thread 179 spilling in-memory map of
352.2 MB to disk (39 times so far) 
15/08/24 20:24:34 INFO ExternalSorter: Thread 197 spilling in-memory map of
352.2 MB to disk (39 times so far) 
15/08/24 20:24:34 INFO ExternalSorter: Thread 192 spilling in-memory map of
352.2 MB to disk (39 times so far) 
15/08/24 20:24:36 INFO ShuffleMemoryManager: TID 798 waiting for at least
1/2N of shuffle memory pool to be free 
15/08/24 20:24:36 INFO ExternalSorter: Thread 170 spilling in-memory map of
352.2 MB to disk (39 times so far) 
15/08/24 20:24:36 INFO ExternalSorter: Thread 171 spilling in-memory map of
352.2 MB to disk (40 times so far) 

When I force a stack trace with jstack, I get stack traces like this for
each of my 36 cores: 

"Executor task launch worker-44" daemon prio=10 tid=0x00007fb9c404d000
nid=0x417f runnable [0x00007fb5e3ffe000] 
   java.lang.Thread.State: RUNNABLE 
        at
scala.collection.mutable.ResizableArray$class.$init$(ResizableArray.scala:32) 
        at scala.collection.mutable.ArrayBuffer.<init>(ArrayBuffer.scala:49) 
        at scala.collection.mutable.ArrayBuffer.<init>(ArrayBuffer.scala:63) 
        at
org.apache.spark.util.SizeEstimator$SearchState.<init>(SizeEstimator.scala:154) 
        at
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:183) 
        at
org.apache.spark.util.SizeEstimator$$anonfun$sampleArray$1.apply$mcVI$sp(SizeEstimator.scala:262) 
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:166) 
        at
org.apache.spark.util.SizeEstimator$.sampleArray(SizeEstimator.scala:254) 
        at
org.apache.spark.util.SizeEstimator$.visitArray(SizeEstimator.scala:238) 
        at
org.apache.spark.util.SizeEstimator$.visitSingleObject(SizeEstimator.scala:194) 
        at
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(SizeEstimator.scala:186) 
        at
org.apache.spark.util.SizeEstimator$.estimate(SizeEstimator.scala:54) 
        at
org.apache.spark.util.collection.SizeTracker$class.takeSample(SizeTracker.scala:78) 
        at
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(SizeTracker.scala:70) 
        at
org.apache.spark.util.collection.PartitionedPairBuffer.afterUpdate(PartitionedPairBuffer.scala:30) 
        at
org.apache.spark.util.collection.PartitionedPairBuffer.insert(PartitionedPairBuffer.scala:53) 
        at
org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:214) 
        at
org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) 
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) 
        at
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) 
        at org.apache.spark.scheduler.Task.run(Task.scala:88) 
        at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
        at java.lang.Thread.run(Thread.java:745) 



Jmap Shows: 

root@ip-172-31-28-227:~# jmap -heap 16421 
Attaching to process ID 16421, please wait... 
Debugger attached successfully. 
Server compiler detected. 
JVM version is 24.79-b02 

using thread-local object allocation. 
Parallel GC with 25 thread(s) 

Heap Configuration: 
   MinHeapFreeRatio = 0 
   MaxHeapFreeRatio = 100 
   MaxHeapSize      = 51539607552 (49152.0MB) 
   NewSize          = 1310720 (1.25MB) 
   MaxNewSize       = 17592186044415 MB 
   OldSize          = 5439488 (5.1875MB) 
   NewRatio         = 2 
   SurvivorRatio    = 8 
   PermSize         = 21757952 (20.75MB) 
   MaxPermSize      = 268435456 (256.0MB) 
   G1HeapRegionSize = 0 (0.0MB) 

Heap Usage: 
PS Young Generation 
Eden Space: 
   capacity = 15290335232 (14582.0MB) 
   used     = 11958528232 (11404.541236877441MB) 
   free     = 3331807000 (3177.4587631225586MB) 
   78.20971908433302% used 
>From Space: 
   capacity = 869269504 (829.0MB) 
   used     = 868910560 (828.6576843261719MB) 
   free     = 358944 (0.342315673828125MB) 
   99.9587073976082% used 
To Space: 
   capacity = 965738496 (921.0MB) 
   used     = 0 (0.0MB) 
   free     = 965738496 (921.0MB) 
   0.0% used 
PS Old Generation 
   capacity = 11453595648 (10923.0MB) 
   used     = 1423152248 (1357.223747253418MB) 
   free     = 10030443400 (9565.776252746582MB) 
   12.425375329611077% used 
PS Perm Generation 
   capacity = 107479040 (102.5MB) 
   used     = 107107360 (102.14553833007812MB) 
   free     = 371680 (0.354461669921875MB) 
   99.65418373666158% used 

27990 interned Strings occupying 2892496 bytes. 


This seems to indicate that the Threads are dumping extra data into temp
files on the disk, but none of the temp files in /tmp/ are growing as far as
I can tell.  I think this means the spilling isn't actually happening, or I
don't know where to look for it if it is.   
I can't tell what's going on here, does anyone have any suggestions on how
to work through this? 

As an update, when I reran the job with spark.shuffle.spill = false, the job
moves along very very slowly and I get stack traces like:

	
Thread 20826: (state = IN_VM)
 - java.util.IdentityHashMap.resize(int) @bci=54, line=469 (Compiled frame)
 - java.util.IdentityHashMap.put(java.lang.Object, java.lang.Object)
@bci=118, line=445 (Compiled frame)
 - org.apache.spark.util.SizeEstimator$SearchState.enqueue(java.lang.Object)
@bci=21, line=159 (Compiled frame)
 -
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(java.lang.reflect.Field)
@bci=12, line=203 (Compiled frame)
 -
org.apache.spark.util.SizeEstimator$$anonfun$visitSingleObject$1.apply(java.lang.Object)
@bci=5, line=202 (Compiled frame)
 - scala.collection.immutable.List.foreach(scala.Function1) @bci=15,
line=381 (Compiled frame)
 - org.apache.spark.util.SizeEstimator$.visitSingleObject(java.lang.Obje
ct, org.apache.spark.util.SizeEstimator$SearchState) @bci=75, li
ne=202 (Compiled frame)
 -
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(java.lang.Object,
java.util.IdentityHashMap) @bci=
32, line=186 (Compiled frame)
 -
org.apache.spark.util.SizeEstimator$$anonfun$sampleArray$1.apply$mcVI$sp(int)
@bci=71, line=262 (Compiled frame)
 - scala.collection.immutable.Range.foreach$mVc$sp(scala.Function1) @bci=87,
line=166 (Compiled frame)
 - org.apache.spark.util.SizeEstimator$.sampleArray(java.lang.Object,
org.apache.spark.util.SizeEstimator$SearchState, java.util.Random,
 org.apache.spark.util.collection.OpenHashSet, int) @bci=39, line=254
(Compiled frame)
 - org.apache.spark.util.SizeEstimator$.visitArray(java.lang.Object,
java.lang.Class, org.apache.spark.util.SizeEstimator$SearchState) @
bci=185, line=238 (Compiled frame)
 - org.apache.spark.util.SizeEstimator$.visitSingleObject(java.lang.Object,
org.apache.spark.util.SizeEstimator$SearchState) @bci=16, li
ne=194 (Compiled frame)
 -
org.apache.spark.util.SizeEstimator$.org$apache$spark$util$SizeEstimator$$estimate(java.lang.Object,
java.util.IdentityHashMap) @bci=
32, line=186 (Compiled frame)
 - org.apache.spark.util.SizeEstimator$.estimate(java.lang.Object) @bci=9,
line=54 (Compiled frame)
 -
org.apache.spark.util.collection.SizeTracker$class.takeSample(org.apache.spark.util.collection.SizeTracker)
@bci=23, line=78 (Compile
d frame)
 -
org.apache.spark.util.collection.SizeTracker$class.afterUpdate(org.apache.spark.util.collection.SizeTracker)
@bci=31, line=70 (Compil
ed frame)
 - org.apache.spark.util.collection.PartitionedPairBuffer.afterUpdate()
@bci=1, line=30 (Compiled frame)
 - org.apache.spark.util.collection.PartitionedPairBuffer.insert(int,
java.lang.Object, java.lang.Object) @bci=63, line=53 (Compiled fra
me)
 -
org.apache.spark.util.collection.ExternalSorter.insertAll(scala.collection.Iterator)
@bci=200, line=214 (Compiled frame)
 -
org.apache.spark.shuffle.sort.SortShuffleWriter.write(scala.collection.Iterator)
@bci=203, line=73 (Interpreted frame)
 -
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext)
@bci=186, line=73 (Interpreted frame)
 -
org.apache.spark.scheduler.ShuffleMapTask.runTask(org.apache.spark.TaskContext)
@bci=2, line=41 (Interpreted frame)
 - org.apache.spark.scheduler.Task.run(long, int,
org.apache.spark.metrics.MetricsSystem) @bci=118, line=88 (Interpreted
frame)
 - org.apache.spark.executor.Executor$TaskRunner.run() @bci=334, line=214
(Interpreted frame)
 -
java.util.concurrent.ThreadPoolExecutor.runWorker(java.util.concurrent.ThreadPoolExecutor$Worker)
@bci=95, line=1145 (Interpreted fra
me)
 - java.util.concurrent.ThreadPoolExecutor$Worker.run() @bci=5, line=615
(Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=745 (Interpreted frame)

Would appreciate any advice or suggestions!

Best, 
Dan 




--
View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/ExternalSorter-Thread-spilling-in-memory-map-of-352-6-MB-to-disk-38-times-so-far-tp13806.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org