You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "HeartSaVioR (via GitHub)" <gi...@apache.org> on 2023/09/01 13:49:33 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request, #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

HeartSaVioR opened a new pull request, #42773:
URL: https://github.com/apache/spark/pull/42773

   ### What changes were proposed in this pull request?
   
   This PR proposes to revert back the behavior of idle progress for StreamingQuery API from [SPARK-43183](https://issues.apache.org/jira/browse/SPARK-43183), to avoid breakage of tests from 3rd party data sources.
   
   ### Why are the changes needed?
   
   We indicated that the behavioral change from SPARK-43183 broke many tests in 3rd party data sources.
   (Short summary of SPARK-43183: we changed the behavior of idle progress to only provide idle event callback, instead of making progress update callback as well as adding progress for StreamingQuery API to provide as recent progresses/last progress.)
   
   The main rationale of SPARK-43183 was to avoid making progress update callback for idle event, which had been confused users. That is more about streaming query listener, and not necessarily had to change the behavior of StreamingQuery API as well.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, but the user-facing change is technically reduced before this PR, as we revert back the behavioral change partially from SPARK-43183, which wasn't released yet. 
   
   ### How was this patch tested?
   
   Modified tests. Also manually ran 3rd party data source tests which were broken with Spark 3.5.0 RC which succeeded with this change.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a diff in pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on code in PR #42773:
URL: https://github.com/apache/spark/pull/42773#discussion_r1313611430


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala:
##########
@@ -172,96 +188,102 @@ trait ProgressReporter extends Logging {
       currentTriggerLatestOffsets != null)
     currentTriggerEndTimestamp = triggerClock.getTimeMillis()
 
-    if (hasExecuted) {
-      val executionStats = extractExecutionStats(hasNewData)
-      val processingTimeMills = currentTriggerEndTimestamp - currentTriggerStartTimestamp
-      val processingTimeSec = Math.max(1L, processingTimeMills).toDouble / MILLIS_PER_SECOND
+    val executionStats = extractExecutionStats(hasNewData, hasExecuted)

Review Comment:
   Context: L191-L251 is technically revert of SPARK-43183.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42773:
URL: https://github.com/apache/spark/pull/42773#issuecomment-1702821705

   DISCLAIMER: I wanted to revert relevant commits and re-work on supporting idle event which would be easier for reviewers to look at, but this was also touched with Spark connect effort hence very hard to revert.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] anishshri-db commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "anishshri-db (via GitHub)" <gi...@apache.org>.
anishshri-db commented on PR #42773:
URL: https://github.com/apache/spark/pull/42773#issuecomment-1703443324

   @HeartSaVioR - revert itself looks fine. Is there a way to check if lastProgress outputs the same result before SPARK-43183 and with this change ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42773:
URL: https://github.com/apache/spark/pull/42773#issuecomment-1702788366

   Will have a separate PR for branch-3.5 to reduce a hop in case there is merge conflict.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR closed pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183
URL: https://github.com/apache/spark/pull/42773


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42773:
URL: https://github.com/apache/spark/pull/42773#issuecomment-1704533152

   Thanks for reviewing! Merging to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #42773: [SPARK-45045][SS] Revert back the behavior of idle progress for StreamingQuery API from SPARK-43183

Posted by "HeartSaVioR (via GitHub)" <gi...@apache.org>.
HeartSaVioR commented on PR #42773:
URL: https://github.com/apache/spark/pull/42773#issuecomment-1702827928

   cc. @xuanyuanking 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org