You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Gengliang Wang (Jira)" <ji...@apache.org> on 2021/08/31 16:53:00 UTC

[jira] [Resolved] (SPARK-36619) HDFSBackedStateStore and RocksDBStateStore have bugs on prefix scan

     [ https://issues.apache.org/jira/browse/SPARK-36619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gengliang Wang resolved SPARK-36619.
------------------------------------
    Fix Version/s: 3.2.0
       Resolution: Fixed

Issue resolved by pull request 33870
[https://github.com/apache/spark/pull/33870]

> HDFSBackedStateStore and RocksDBStateStore have bugs on prefix scan
> -------------------------------------------------------------------
>
>                 Key: SPARK-36619
>                 URL: https://issues.apache.org/jira/browse/SPARK-36619
>             Project: Spark
>          Issue Type: Bug
>          Components: Structured Streaming
>    Affects Versions: 3.2.0
>            Reporter: Jungtaek Lim
>            Assignee: Jungtaek Lim
>            Priority: Blocker
>             Fix For: 3.2.0
>
>
> In RocksDB state store provider implementation, we leverage iterators on prefix scan, which are being closed in rollback() method.
> While this works now for session window since state store instance in read physical plan will always call abort, it could bring correctness issue for stateful operator which doesn't instantiate two different physical plans on read and write.
> We should make sure these iterators get closed to let these iterators don't affect multiple micro-batches (plays as side-effect).
> In HDFSBackedStateStore, we copy both data map and prefix key map on creating new map for writing, but we copied prefix key map as it is, which does shallow copy on value side (the type of value is Set), which is broken on reloading aborted version.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org