You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "anishshri-db (via GitHub)" <gi...@apache.org> on 2023/10/14 01:21:13 UTC
[PR] [SPARK-45539] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
anishshri-db opened a new pull request, #43370:
URL: https://github.com/apache/spark/pull/43370
### What changes were proposed in this pull request?
Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode
### Why are the changes needed?
We have a check for ensuring that watermark attributes are specified in append mode based on the UnsupportedOperationChecker. However, in some cases we got report where user hit this stack trace:
```
org.apache.spark.SparkException: Exception thrown in awaitResult: Job aborted due to stage failure: Task 3 in stage 32.0 failed 4 times, most recent failure: Lost task 3.3 in stage 32.0 (TID 606) (10.5.71.29 executor 0): java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:529)
at scala.None$.get(Option.scala:527)
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:472)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:708)
at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:145)
at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:145)
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.timeTakenMs(statefulOperators.scala:414)
at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$5(statefulOperators.scala:470)
at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.$anonfun$mapPartitionsWithStateStore$1(package.scala:63)
at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:127)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
```
In this case, the reason for failure is not immediately clear. Hence adding an assert and log message to indicate why the query failed on the executor.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing unit tests
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763744409
Seems to be failing for other PRs too - https://github.com/apache/spark/actions/runs/6527209920/job/17723040668
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763741620
Kafka issue seems be intermittent and not related to this change - was able to ran locally
```
[info] No tests to run for mllib / Test / testOnly
[info] - stress test with multiple topics and partitions (56 seconds, 819 milliseconds)
[info] Run completed in 1 minute, 3 seconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #43370: [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode
URL: https://github.com/apache/spark/pull/43370
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763759671
I agree the failure is unrelated.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1762462455
cc - @HeartSaVioR - PTAL, thx
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763759789
Thanks! Merging to master.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]
Posted by "smileyboy2019 (via GitHub)" <gi...@apache.org>.
smileyboy2019 commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1765554151
CEP support in Spark Streaming
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org