You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by David McWhorter <mc...@ccri.com> on 2014/12/19 20:35:10 UTC
DAGScheduler StackOverflowError
Hi all,
I'm developing a spark application where I need to iteratively update an
RDD over a large number of iterations (1000+). From reading online,
I've found that I should use .checkpoint() to keep the graph from
growing too large. Even when doing this, I keep getting
StackOverflowError's in DAGScheduler such as the one below. I've
attached a sample application that illustrates what I'm trying to do.
Can anyone point out how I can keep the DAG from growing so large that
spark is not able to process it?
Thank you,
David
java.lang.StackOverflowError
at
scala.collection.generic.GenMapFactory$MapCanBuildFrom.scala$collection$generic$GenMapFactory$MapCanBuil
dFrom$$$outer(GenMapFactory.scala:57)
at
scala.collection.generic.GenMapFactory$MapCanBuildFrom.apply(GenMapFactory.scala:58)
at
scala.collection.generic.GenMapFactory$MapCanBuildFrom.apply(GenMapFactory.scala:57)
at
scala.collection.TraversableLike$class.$plus$plus(TraversableLike.scala:154)
at
scala.collection.AbstractTraversable.$plus$plus(Traversable.scala:105)
at scala.collection.immutable.HashMap.$plus(HashMap.scala:60)
at scala.collection.immutable.Map$Map4.updated(Map.scala:172)
at scala.collection.immutable.Map$Map4.$plus(Map.scala:173)
at scala.collection.immutable.Map$Map4.$plus(Map.scala:158)
at
scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
at
scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
at
scala.collection.TraversableOnce$$anonfun$toMap$1.apply(TraversableOnce.scala:280)
at
scala.collection.TraversableOnce$$anonfun$toMap$1.apply(TraversableOnce.scala:279)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
at
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
at
scala.collection.TraversableOnce$class.toMap(TraversableOnce.scala:279)
at
scala.collection.AbstractTraversable.toMap(Traversable.scala:105)
at
org.apache.spark.storage.BlockManager$.blockIdsToBlockManagers(BlockManager.scala:1264)
at
org.apache.spark.scheduler.DAGScheduler.getCacheLocs(DAGScheduler.scala:199)
at
org.apache.spark.scheduler.DAGScheduler.visit$3(DAGScheduler.scala:372)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getMissingParentStages(
DAGScheduler.scala:389)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGSchedule
r.scala:774)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:781)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:780)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGSchedule
r.scala:780)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:781)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:780)
at scala.collection.immutable.List.foreach(List.scala:318)
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGSchedule
r.scala:780)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:781)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$submitStage$4.
apply(DAGScheduler.scala:780)
.... (last 4 lines repeating 1000's of times)
--
David McWhorter
Software Engineer
Commonwealth Computer Research, Inc.
1422 Sachem Place, Unit #1
Charlottesville, VA 22901
mcwhorter@ccri.com | 434.299.0090x204