You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by tian zhang <tz...@yahoo.com.INVALID> on 2014/11/24 06:31:28 UTC

2 spark streaming questions

Hi, Dear Spark Streaming Developers and Users,
We are prototyping using spark streaming and hit the following 2 issues thatI would like to seek your expertise.
1) We have a spark streaming application in scala, that reads  data from Kafka intoa DStream, does some processing and output a transformed DStream. If for some reasonthe Kafka connection is not available or timed out, the spark streaming job will startto send empty RDD afterwards. The log is clean w/o any ERROR indicator. I googled  around and this seems to be a known issue.We believe that spark streaming infrastructure should either retry or return error/exception.Can you share how you handle this case?
2) We would like implement a spark streaming job that join an 1 minute  duration DStream of real time eventswith a metadata RDD that was read from a database. The metadata only changes slightly each day in the database.So what is the best practice of refresh the RDD daily keep the streaming join job running? Is this do-able as of spark 1.1.0?
Thanks.
Tian