You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Balikov (Jira)" <ji...@apache.org> on 2022/02/24 17:52:00 UTC

[jira] [Created] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

Alex Balikov created SPARK-38320:
------------------------------------

             Summary: (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
                 Key: SPARK-38320
                 URL: https://issues.apache.org/jira/browse/SPARK-38320
             Project: Spark
          Issue Type: Bug
          Components: Structured Streaming
    Affects Versions: 3.2.1
            Reporter: Alex Balikov


We have identified an issue where the RocksDB state store iterator will not pick up store updates made after its creation. As a result of this, the _timeoutProcessorIter_ in 

[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]

will not pick up state changes made during newDataProcessorIter input processing. The user observed behavior is that a group state may receive input records and also be called with timeout in the same micro batch. This contradics the public documentation for GroupState -

https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html

 
 * The timeout is reset every time the function is called on a group, that is, when the group has new data, or the group has timed out. So the user has to set the timeout duration every time the function is called, otherwise, there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org