You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jack Hu (JIRA)" <ji...@apache.org> on 2015/04/10 11:51:12 UTC
[jira] [Created] (SPARK-6847) Stack overflow on updateStateByKey
which followed by a dstream with checkpoint set
Jack Hu created SPARK-6847:
------------------------------
Summary: Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
Key: SPARK-6847
URL: https://issues.apache.org/jira/browse/SPARK-6847
Project: Spark
Issue Type: Bug
Components: Streaming
Reporter: Jack Hu
The issue happens with the following sample code: uses {{updateStateByKey}} followed by a {{map}} with checkpoint interval 10 seconds
{code}
val sparkConf = new SparkConf().setAppName("test")
val streamingContext = new StreamingContext(sparkConf, Seconds(10))
streamingContext.checkpoint("""checkpoint""")
val source = streamingContext.socketTextStream("localhost", 9999)
val updatedResult = source.map(
(1,_)).updateStateByKey(
(newlist : Seq[String], oldstate : Option[String]) => newlist.headOption.orElse(oldstate))
updatedResult.map(_._2)
.checkpoint(Seconds(10))
.foreachRDD((rdd, t) => {
println("Deep: " + rdd.toDebugString.split("\n").length)
println(t.toString() + ": " + rdd.collect.length)
})
streamingContext.start()
streamingContext.awaitTermination()
{code}
>From the output, we can see that the dependency will increasing time over time, the {{updateStateByKey}} never get check-pointed, and finally, the stack overflow will happen.
Note:
The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but not the {{updateStateByKey}}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org