You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by SRK <sw...@gmail.com> on 2017/06/28 00:48:24 UTC

How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

Hi,

I have checkpoints enabled in Spark streaming and I use updateStateByKey and
reduceByKeyAndWindow with inverse functions. How do I reduce the amount of
data that I am writing to the checkpoint or clear out the data that I dont
care?

Thanks!



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

Posted by "Yuval.Itzchakov" <yu...@gmail.com>.
Using a long period betweem checkpoints may cause a long linage of the graphs
computations to be created, since Spark uses checkpointing to cut it, which
can also cause a delay in the streaming job.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798p28820.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org


Re: How to reduce the amount of data that is getting written to the checkpoint from Spark Streaming

Posted by "Yuval.Itzchakov" <yu...@gmail.com>.
You can't. Spark doesn't let you fiddle with the data being checkpoint, as
it's an internal implementation detail.



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-reduce-the-amount-of-data-that-is-getting-written-to-the-checkpoint-from-Spark-Streaming-tp28798p28815.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org