You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by "jingz-db (via GitHub)" <gi...@apache.org> on 2024/01/29 05:16:38 UTC

[PR] [SS] Add a check for stateful operator change for streaming [spark]

jingz-db opened a new pull request, #44927:
URL: https://github.com/apache/spark/pull/44927

### What changes were proposed in this pull request?

Currently user will get a misleading error as org.apache.spark.sql.execution.streaming.state.StateSchemaNotCompatible if restarting query in the same checkpoint location and changing their stateful operator. This PR catches such errors and throws a new error with informative message.

After physical planning, before execution phase, we will read from state metadata with the current operator id to fetch operator name of committed batch with the same operator id. If operator name does not match, throws the error.

### Why are the changes needed?

The current error message is misleading to users. We should provide users with message that can guide them to the real root cause of error.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org