You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2022/04/05 06:30:00 UTC

[jira] [Updated] (SPARK-38320) (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch

     [ https://issues.apache.org/jira/browse/SPARK-38320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated SPARK-38320:
---------------------------------
    Labels: correctness releasenotes  (was: correctness)

> (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch
> ---------------------------------------------------------------------------------------------
>
>                 Key: SPARK-38320
>                 URL: https://issues.apache.org/jira/browse/SPARK-38320
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.2.1
>            Reporter: Alex Balikov
>            Assignee: Alex Balikov
>            Priority: Major
>              Labels: correctness, releasenotes
>             Fix For: 3.3.0, 3.2.2
>
>
> We have identified an issue where the RocksDB state store iterator will not pick up store updates made after its creation. As a result of this, the _timeoutProcessorIter_ in
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/FlatMapGroupsWithStateExec.scala]
> will not pick up state changes made during _newDataProcessorIter_ input processing. The user observed behavior is that a group state may receive input records and also be called with timeout in the same micro batch. This contradics the public documentation for GroupState -
> [https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/streaming/GroupState.html]
>  * The timeout is reset every time the function is called on a group, that is, when the group has new data, or the group has timed out. So the user has to set the timeout duration every time the function is called, otherwise, there will not be any timeout set.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org