You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2018/07/11 02:36:00 UTC

[jira] [Resolved] (SPARK-24730) Add policy to choose max as global watermark when streaming query has multiple watermarks

     [ https://issues.apache.org/jira/browse/SPARK-24730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-24730.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 21701
[https://github.com/apache/spark/pull/21701]

> Add policy to choose max as global watermark when streaming query has multiple watermarks
> -----------------------------------------------------------------------------------------
>
>                 Key: SPARK-24730
>                 URL: https://issues.apache.org/jira/browse/SPARK-24730
>             Project: Spark
>          Issue Type: New Feature
>          Components: Structured Streaming
>    Affects Versions: 2.3.1
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>             Fix For: 3.0.0
>
>
> Currently, when a streaming query has multiple watermark, the policy is to choose the min of them as the global watermark. This is safe to do as the global watermark moves with the slowest stream, and is therefore is safe as it does not unexpectedly drop some data as late, etc. While this is indeed the safe thing to do, in some cases, you may want the watermark to advance with the fastest stream, that is, take the max of multiple watermarks. This JIRA is to add that configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org