You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jack Hu (JIRA)" <ji...@apache.org> on 2015/04/10 11:51:12 UTC

[jira] [Created] (SPARK-6847) Stack overflow on updateStateByKey which followed by a dstream with checkpoint set

Jack Hu created SPARK-6847:
------------------------------

             Summary: Stack overflow on updateStateByKey which followed by a dstream with checkpoint set
                 Key: SPARK-6847
                 URL: https://issues.apache.org/jira/browse/SPARK-6847
             Project: Spark
          Issue Type: Bug
          Components: Streaming
            Reporter: Jack Hu


The issue happens with the following sample code: uses {{updateStateByKey}} followed by a {{map}} with checkpoint interval 10 seconds

{code}
    val sparkConf = new SparkConf().setAppName("test")
    val streamingContext = new StreamingContext(sparkConf, Seconds(10))
    streamingContext.checkpoint("""checkpoint""")
    val source = streamingContext.socketTextStream("localhost", 9999)
    val updatedResult = source.map(
        (1,_)).updateStateByKey(
            (newlist : Seq[String], oldstate : Option[String]) =>     newlist.headOption.orElse(oldstate))
    updatedResult.map(_._2)
    .checkpoint(Seconds(10))
    .foreachRDD((rdd, t) => {
      println("Deep: " + rdd.toDebugString.split("\n").length)
      println(t.toString() + ": " + rdd.collect.length)
    })
    streamingContext.start()
    streamingContext.awaitTermination()
{code}

>From the output, we can see that the dependency will increasing time over time, the {{updateStateByKey}} never get check-pointed,  and finally, the stack overflow will happen. 

Note:
The rdd in {{updatedResult.map(_._2)}} get check-pointed in this case, but not the {{updateStateByKey}} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org