You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Apache Spark (JIRA)" <ji...@apache.org> on 2019/04/25 16:56:01 UTC

[jira] [Assigned] (SPARK-27237) Introduce State schema validation among query restart

     [ https://issues.apache.org/jira/browse/SPARK-27237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Apache Spark reassigned SPARK-27237:
------------------------------------

    Assignee: Apache Spark

> Introduce State schema validation among query restart
> -----------------------------------------------------
>
>                 Key: SPARK-27237
>                 URL: https://issues.apache.org/jira/browse/SPARK-27237
>             Project: Spark
>          Issue Type: Improvement
>          Components: Structured Streaming
>    Affects Versions: 3.0.0
>            Reporter: Jungtaek Lim
>            Assignee: Apache Spark
>            Priority: Major
>
> Even though Spark structured streaming guide page clearly documents that "Any change in number or type of grouping keys or aggregates is not allowed.", Spark doesn't do anything when end users try to do it, which would end up with indeterministic outputs or unexpected exceptions.
> Even worse, if the query doesn't crash by chance it could write the new messed values to state which completely breaks state unless end users roll back to specific batch via manually editing checkpoint.
> The restriction is clear, the number of columns, and data type for each must not be modified among query runs. We can store schema of state along with state, and verify whether the (maybe) new schema is compatible if state schema is modified. With this validation we can prevent query runs and shows indeterministic behavior when schema is incompatible, as well as we can give more informative error messages to end users.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org