You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Tathagata Das (Jira)" <ji...@apache.org> on 2020/02/01 00:31:00 UTC

[jira] [Resolved] (SPARK-30657) Streaming limit after streaming dropDuplicates can throw error

     [ https://issues.apache.org/jira/browse/SPARK-30657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tathagata Das resolved SPARK-30657.
-----------------------------------
    Resolution: Fixed

> Streaming limit after streaming dropDuplicates can throw error
> --------------------------------------------------------------
>
>                 Key: SPARK-30657
>                 URL: https://issues.apache.org/jira/browse/SPARK-30657
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 2.3.0, 2.3.1, 2.3.2, 2.3.3, 2.3.4, 2.4.0, 2.4.1, 2.4.2, 2.4.3, 2.4.4
>            Reporter: Tathagata Das
>            Assignee: Tathagata Das
>            Priority: Critical
>             Fix For: 3.0.0
>
>
> {{LocalLimitExec}} does not consume the iterator of the child plan. So if there is a limit after a stateful operator like streaming dedup in append mode (e.g. {{streamingdf.dropDuplicates().limit(5}})), the state changes of streaming duplicate may not be committed (most stateful ops commit state changes only after the generated iterator is fully consumed). This leads to the next batch failing with {{java.lang.IllegalStateException: Error reading delta file .../N.delta does not exist}} as the state store delta file was never generated.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org