You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2023/10/26 05:58:00 UTC

[jira] [Updated] (SPARK-45672) Provide a unified user-facing schema for state format versions in state data source - reader

     [ https://issues.apache.org/jira/browse/SPARK-45672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jungtaek Lim updated SPARK-45672:
---------------------------------
        Parent: SPARK-45511
    Issue Type: Sub-task  (was: Improvement)

> Provide a unified user-facing schema for state format versions in state data source - reader
> --------------------------------------------------------------------------------------------
>
>                 Key: SPARK-45672
>                 URL: https://issues.apache.org/jira/browse/SPARK-45672
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Structured Streaming
>    Affects Versions: 4.0.0
>            Reporter: Jungtaek Lim
>            Priority: Major
>
> As of now, except stream-stream join with joinSide option being specified, state data source would provide the state "as it is" in the state store. This means state data source will provide the different schema for operators having multiple state format versions.
> From users' perspective, they do not care about the state format version, hence may be confused if the state data source produces different schema.
> That said, we could probably consider defining and providing the same user facing schema for each operator.
> *Note that this would need further discussion* before coming up with code, because there is a clear trade-off. It makes a strong coupling between state data source and the implementation of stateful operators. Also, for the argument of non-predictable output schema, users can call printSchema() to see the output schema in prior to query.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org