You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Jungtaek Lim (Jira)" <ji...@apache.org> on 2023/10/26 05:44:00 UTC

[jira] [Created] (SPARK-45671) Implement an option similar to corrupt record column in State Data Source Reader

Jungtaek Lim created SPARK-45671:
------------------------------------

             Summary: Implement an option similar to corrupt record column in State Data Source Reader
                 Key: SPARK-45671
                 URL: https://issues.apache.org/jira/browse/SPARK-45671
             Project: Spark
          Issue Type: Sub-task
          Components: Structured Streaming
    Affects Versions: 4.0.0
            Reporter: Jungtaek Lim


Querying against the state would be most likely failing if the underlying state file is corrupted. There may be another case that the binary data (raw) state store read from state file does not fit with state schema and ends up with exception/fatal error in runtime.

(We can't catch the case where the data is loaded with incorrect schema if it does not throw an exception. We cannot add the schema for every data.)

To handle above cases without failure, we want to provide state rows for valid rows, with also providing binary data for corrupted rows (like we do for CSV/JSON) if users specify an option.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org