You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liwen Sun (Jira)" <ji...@apache.org> on 2020/09/02 01:32:00 UTC

[jira] [Created] (SPARK-32776) Limit in streaming should not be optimized away by PropagateEmptyRelation

Liwen Sun created SPARK-32776:
---------------------------------

             Summary: Limit in streaming should not be optimized away by PropagateEmptyRelation
                 Key: SPARK-32776
                 URL: https://issues.apache.org/jira/browse/SPARK-32776
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.1.0
            Reporter: Liwen Sun


Right now, the limit operator in a streaming query may get optimized away when the relation is empty. This can be problematic for stateful streaming, as this empty batch will not write any state store files, and the next batch will fail when trying to read these state store files and throw a file not found error.

We should not let PropagateEmptyRelation optimize away the Limit operator for streaming queries.

This ticket is intended to apply a small and safe fix for PropagateEmptyRelation. A fundamental fix that can prevent this from happening again in the future and in other optimizer rules is more desirable, but that's a much larger task.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org