You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Etienne Chauchot (Jira)" <ji...@apache.org> on 2021/04/07 09:33:00 UTC
[jira] [Comment Edited] (BEAM-10789) Add support for checkpointing
in Spark streaming
[ https://issues.apache.org/jira/browse/BEAM-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316170#comment-17316170 ]
Etienne Chauchot edited comment on BEAM-10789 at 4/7/21, 9:32 AM:
------------------------------------------------------------------
You are mentioning a pipeline that uses _CreateStream_ transform which is for tests only and that does not need checkpointing. A regular production pipeline uses _Read_ transform that is translated (1) to a spark _SourceDStream_ (2) that supports checkpointing.
[1][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java#L537]
[2][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L93]
was (Author: echauchot):
You are mentioning a pipeline that uses _CreateStream_ transform which is for tests only and that does not need checkpointing. A regular production pipeline uses _Read_ transform that is translated (1) to a spark _SourceDStream_ (2) that supports checkpointing.
[1][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java#L537]
[2]https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L93
> Add support for checkpointing in Spark streaming
> ------------------------------------------------
>
> Key: BEAM-10789
> URL: https://issues.apache.org/jira/browse/BEAM-10789
> Project: Beam
> Issue Type: Improvement
> Components: runner-spark
> Reporter: Anna Qin
> Priority: P3
> Labels: portability-spark
>
> Spark streaming (both portable and non-portable) currently uses queueStream to create the initial DStream from a queue of RDDs. However, queueStream does not support checkpointing.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)