You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Alex Balikov (Jira)" <ji...@apache.org> on 2022/02/24 17:52:00 UTC
[jira] [Created] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
Alex Balikov created SPARK-38320:
------------------------------------
Summary: (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
Key: SPARK-38320
URL: https://issues.apache.org/jira/browse/SPARK-38320
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 3.2.1
Reporter: Alex Balikov
We have identified an issue where the RocksDB state store iterator will not pick up store updates made after its creation. As a result of this, the _timeoutProcessorIter_ in
[https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
will not pick up state changes made during newDataProcessorIter input processing. The user observed behavior is that a group state may receive input records and also be called with timeout in the same micro batch. This contradics the public documentation for GroupState -
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html
* The timeout is reset every time the function is called on a group, that is, when the group has new data, or the group has timed out. So the user has to set the timeout duration every time the function is called, otherwise, there will not be any timeout set.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org