You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Panos Str <st...@gmail.com> on 2015/10/30 23:10:53 UTC
Stack overflow error caused by long lineage RDD created after many
recursions
Hi all!
Here's a part of a Scala recursion that produces a stack overflow after many
recursions. I've tried many things but I've not managed to solve it.
val eRDD: RDD[(Int,Int)] = ...
val oldRDD: RDD[Int,Int]= ...
val result = *Algorithm*(eRDD,oldRDD)
*Algorithm*(eRDD: RDD[(Int,Int)] , oldRDD: RDD[(Int,Int)]) : RDD[(Int,Int)]{
val newRDD = *Transformation*(eRDD,oldRDD)//only transformations
if(*Compare*(oldRDD,newRDD)) //Compare has the "take" action!!
return *Algorithm*(eRDD,newRDD)
else
return newRDD
}
The above code is recursive and performs many iterations (until the compare
returns false)
After some iterations I get a stack overflow error. Probably the lineage
chain has become too long. Is there any way to solve this problem?
(persist/unpersist, checkpoint, sc.saveAsObjectFile).
Note1: Only compare function performs Actions on RDDs
Note2: I tried some combinations of persist/unpersist but none of them
worked!
I tried checkpointing from spark.streaming. I put a checkpoint at every
recursion but still received an overflow error
I also tried using sc.saveAsObjectFile per iteration and then reading from
file (sc.objectFile) during the next iteration. Unfortunately I noticed that
the folders are created per iteration are increasing while I was expecting
from them to have equal size per iteration.
please help!!
--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Stack-overflow-error-caused-by-long-lineage-RDD-created-after-many-recursions-tp25240.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org
Re: Stack overflow error caused by long lineage RDD created after
many recursions
Posted by Tathagata Das <td...@databricks.com>.
You have to run some action after rdd.checkpointi() for the checkpointing
to actually occur. Have you done that?
On Fri, Oct 30, 2015 at 3:10 PM, Panos Str <st...@gmail.com> wrote:
> Hi all!
>
> Here's a part of a Scala recursion that produces a stack overflow after
> many
> recursions. I've tried many things but I've not managed to solve it.
>
> val eRDD: RDD[(Int,Int)] = ...
>
> val oldRDD: RDD[Int,Int]= ...
>
> val result = *Algorithm*(eRDD,oldRDD)
>
>
> *Algorithm*(eRDD: RDD[(Int,Int)] , oldRDD: RDD[(Int,Int)]) :
> RDD[(Int,Int)]{
>
> val newRDD = *Transformation*(eRDD,oldRDD)//only transformations
>
> if(*Compare*(oldRDD,newRDD)) //Compare has the "take" action!!
>
> return *Algorithm*(eRDD,newRDD)
>
> else
>
> return newRDD
> }
>
> The above code is recursive and performs many iterations (until the compare
> returns false)
>
> After some iterations I get a stack overflow error. Probably the lineage
> chain has become too long. Is there any way to solve this problem?
> (persist/unpersist, checkpoint, sc.saveAsObjectFile).
>
> Note1: Only compare function performs Actions on RDDs
>
> Note2: I tried some combinations of persist/unpersist but none of them
> worked!
>
> I tried checkpointing from spark.streaming. I put a checkpoint at every
> recursion but still received an overflow error
>
> I also tried using sc.saveAsObjectFile per iteration and then reading from
> file (sc.objectFile) during the next iteration. Unfortunately I noticed
> that
> the folders are created per iteration are increasing while I was expecting
> from them to have equal size per iteration.
>
> please help!!
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Stack-overflow-error-caused-by-long-lineage-RDD-created-after-many-recursions-tp25240.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>