You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (Jira)" <ji...@apache.org> on 2019/10/08 05:44:18 UTC

[jira] [Resolved] (SPARK-24144) monotonically_increasing_id on streaming dataFrames

     [ https://issues.apache.org/jira/browse/SPARK-24144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hyukjin Kwon resolved SPARK-24144.
----------------------------------
    Resolution: Incomplete

> monotonically_increasing_id on streaming dataFrames
> ---------------------------------------------------
>
>                 Key: SPARK-24144
>                 URL: https://issues.apache.org/jira/browse/SPARK-24144
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 2.3.0
>            Reporter: Hemant Bhanawat
>            Priority: Major
>              Labels: bulk-closed
>
> For our use case, we want to assign snapshot ids (incrementing counters) to the incoming records. In case of failures, the same record should get the same id after failure so that the downstream DB can handle the records in a correct manner. 
> We were trying to do this by zipping the streaming rdds with that counter using a modified version of ZippedWithIndexRDD. There are other ways to do that but it turns out all ways are cumbersome and error prone in failure scenarios.
> As suggested on the spark user dev list, one way to do this would be to support monotonically_increasing_id on streaming dataFrames in Spark code base. This would ensure that counters are incrementing for the records of the stream. Also, since the counter can be checkpointed, it would work well in case of failure scenarios. Last but not the least, doing this in spark would be the most performance efficient way.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org