You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Etienne Chauchot (Jira)" <ji...@apache.org> on 2021/04/07 09:33:00 UTC

[jira] [Comment Edited] (BEAM-10789) Add support for checkpointing in Spark streaming

    [ https://issues.apache.org/jira/browse/BEAM-10789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17316170#comment-17316170 ] 

Etienne Chauchot edited comment on BEAM-10789 at 4/7/21, 9:32 AM:
------------------------------------------------------------------

You are mentioning a pipeline that uses _CreateStream_ transform which is for tests only and that does not need checkpointing. A regular production pipeline uses _Read_ transform that is translated (1) to a  spark _SourceDStream_ (2) that supports checkpointing.

[1][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java#L537] 

[2][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L93]

 


was (Author: echauchot):
You are mentioning a pipeline that uses _CreateStream_ transform which is for tests only and that does not need checkpointing. A regular production pipeline uses _Read_ transform that is translated (1) to a  spark _SourceDStream_ (2) that supports checkpointing.

[1][https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/translation/streaming/StreamingTransformTranslator.java#L537] 

[2]https://github.com/apache/beam/blob/3216fcb25287448dca3e78a2fd48aee9ac6422a3/runners/spark/src/main/java/org/apache/beam/runners/spark/io/SparkUnboundedSource.java#L93

> Add support for checkpointing in Spark streaming
> ------------------------------------------------
>
>                 Key: BEAM-10789
>                 URL: https://issues.apache.org/jira/browse/BEAM-10789
>             Project: Beam
>          Issue Type: Improvement
>          Components: runner-spark
>            Reporter: Anna Qin
>            Priority: P3
>              Labels: portability-spark
>
> Spark streaming (both portable and non-portable) currently uses queueStream to create the initial DStream from a queue of RDDs. However, queueStream does not support checkpointing.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)