You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "anishshri-db (via GitHub)" <gi...@apache.org> on 2023/10/14 01:21:13 UTC

[PR] [SPARK-45539] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

anishshri-db opened a new pull request, #43370:
URL: https://github.com/apache/spark/pull/43370

   ### What changes were proposed in this pull request?
   Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode
   
   
   ### Why are the changes needed?
   We have a check for ensuring that watermark attributes are specified in append mode based on the UnsupportedOperationChecker. However, in some cases we got report where user hit this stack trace:
   
   ```
   org.apache.spark.SparkException: Exception thrown in awaitResult: Job aborted due to stage failure: Task 3 in stage 32.0 failed 4 times, most recent failure: Lost task 3.3 in stage 32.0 (TID 606) (10.5.71.29 executor 0): java.util.NoSuchElementException: None.get
           at scala.None$.get(Option.scala:529)
           at scala.None$.get(Option.scala:527)
           at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$9(statefulOperators.scala:472)
           at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
           at org.apache.spark.util.Utils$.timeTakenMs(Utils.scala:708)
           at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs(statefulOperators.scala:145)
           at org.apache.spark.sql.execution.streaming.StateStoreWriter.timeTakenMs$(statefulOperators.scala:145)
           at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.timeTakenMs(statefulOperators.scala:414)
           at org.apache.spark.sql.execution.streaming.StateStoreSaveExec.$anonfun$doExecute$5(statefulOperators.scala:470)
           at org.apache.spark.sql.execution.streaming.state.package$StateStoreOps.$anonfun$mapPartitionsWithStateStore$1(package.scala:63)
           at org.apache.spark.sql.execution.streaming.state.StateStoreRDD.compute(StateStoreRDD.scala:127)
           at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:406)
   ```
   
   In this case, the reason for failure is not immediately clear. Hence adding an assert and log message to indicate why the query failed on the executor.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   Existing unit tests
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763744409

   Seems to be failing for other PRs too - https://github.com/apache/spark/actions/runs/6527209920/job/17723040668


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763741620

   Kafka issue seems be intermittent and not related to this change - was able to ran locally
   
   ```
   [info] No tests to run for mllib / Test / testOnly
   [info] - stress test with multiple topics and partitions (56 seconds, 819 milliseconds)
   [info] Run completed in 1 minute, 3 seconds.
   [info] Total number of tests run: 1
   [info] Suites: completed 1, aborted 0
   [info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
   [info] All tests passed.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #43370: [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode
URL: https://github.com/apache/spark/pull/43370


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763759671

   I agree the failure is unrelated.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1762462455

   cc - @HeartSaVioR - PTAL, thx


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1763759789

   Thanks! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Re: [PR] [SPARK-45539][SS] Add assert and log to indicate watermark definition is required for streaming aggregation queries in append mode [spark]

Posted by "smileyboy2019 (via GitHub)" <gi...@apache.org>.
smileyboy2019 commented on PR #43370:
URL: https://github.com/apache/spark/pull/43370#issuecomment-1765554151

   CEP support in Spark Streaming


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org