You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Tathagata Das (Jira)" <ji...@apache.org> on 2020/09/11 07:16:00 UTC

[jira] [Resolved] (SPARK-32794) Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources

     [ https://issues.apache.org/jira/browse/SPARK-32794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-32794.
-----------------------------------
    Fix Version/s:     (was: 2.4.8)
                   2.4.7
       Resolution: Fixed

Issue resolved by pull request 29700
[https://github.com/apache/spark/pull/29700]

> Rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 streaming sources 
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-32794
>                 URL: https://issues.apache.org/jira/browse/SPARK-32794
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.4, 2.4.6, 3.0.0, 3.0.1
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Major
>             Fix For: 2.4.7, 3.1.0, 3.0.2
>
>
> Structured Streaming micro-batch engine has the contract with V1 data sources that, after a restart, it will call `source.getBatch()` on the last batch attempted before the restart. However, a very rare combination of sequences violates this contract. It occurs only when 
> - The streaming query has specific types of stateful operations with watermarks (e.g., aggregation in append, mapGroupsWithState with timeouts). 
>     - These queries can execute a batch even without new data when the previous updates the watermark and the stateful ops are such that the new watermark can cause new output/cleanup. Such batches are called no-data-batches.
> - The last batch before termination was an incomplete no-data-batch. Upon restart, the micro-batch engine fails to call `source.getBatch` when attempting to re-execute the incomplete no-data-batch.
> This occurs because no-data-batches has the same and end offsets, and when a batch is executed, if the start and end offset is same then calling `source.getBatch` is skipped as it is assumed the generated plan will be empty. This only affects V1 data sources which rely on this invariant to initialize differently when the query is being started from scratch or restarted. How will a source misbehave is very source-specific. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org