You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tathagata Das (JIRA)" <ji...@apache.org> on 2017/06/19 22:55:00 UTC
[jira] [Created] (SPARK-21145) Restarted queries reuse same
StateStoreProvider, causing multiple concurrent tasks to update same
StateStore
Tathagata Das created SPARK-21145:
-------------------------------------
Summary: Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore
Key: SPARK-21145
URL: https://issues.apache.org/jira/browse/SPARK-21145
Project: Spark
Issue Type: Bug
Components: Structured Streaming
Affects Versions: 2.2.0
Reporter: Tathagata Das
Assignee: Tathagata Das
StateStoreProvider instances are loaded on-demand in a executor when a query is started. When a query is restarted, the loaded provider instance will get reused. Now, there is a non-trivial chance, that the task of the previous query run is still running, while the tasks of the restarted run has started. So for a stateful partition, there may be two concurrent tasks related to the same stateful partition, and there for using the same provider instance. This can lead to inconsistent results and possibly random failures, as state store implementations are not designed to be thread-safe.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org