You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/26 19:41:07 UTC

[GitHub] [spark] brkyvz opened a new pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

brkyvz opened a new pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040
 
 
   ### What changes were proposed in this pull request?
   
   In Structured Streaming, we provide progress updates every 10 seconds when a stream doesn't have any new data upstream. When providing this progress though, we zero out the input information but not the output information. This PR fixes that bug.
   
   ### Why are the changes needed?
   
   Fixes a bug around incorrect metrics
   
   ### Does this PR introduce any user-facing change?
   
   Fixes a bug in the metrics
   
   ### How was this patch tested?
   
   New regression test

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606194695
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25312/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605522048
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398978806
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
 
 Review comment:
   @tdas This test tests append mode output for no-data microbatches

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763466
 
 
   **[Test build #120444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120444/testReport)** for PR 28040 at commit [`4655611`](https://github.com/apache/spark/commit/465561182d48632ad6ea1b4e4079532a8cc5ac69).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399723128
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   The value is -1 instead of 0 if it doesn't support output metrics, and as you can see the error message in Jenkins build, here the value is 0 instead of -1, because the patch overwrites the value to 0 when the batch hasn't run. So yes the last progress here is for "no data & no run", though the new commit should fix this problem.
   
   > V1 suite
   
   ```
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:08.567Z",
     "batchId" : 0,
     "numInputRows" : 1,
     "inputRowsPerSecond" : 83.33333333333333,
     "processedRowsPerSecond" : 0.3835826620636747,
     "durationMs" : {
       "addBatch" : 2055,
       "getBatch" : 2,
       "latestOffset" : 0,
       "queryPlanning" : 449,
       "triggerExecution" : 2607,
       "walCommit" : 49
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:40.000Z",
       "max" : "1970-01-01T00:01:40.000Z",
       "min" : "1970-01-01T00:01:40.000Z",
       "watermark" : "1970-01-01T00:00:00.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 1,
       "memoryUsedBytes" : 1400,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 0,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 680
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : null,
       "endOffset" : 0,
       "numInputRows" : 1,
       "inputRowsPerSecond" : 83.33333333333333,
       "processedRowsPerSecond" : 0.3835826620636747
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:11.185Z",
     "batchId" : 1,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 935,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 52,
       "triggerExecution" : 1101,
       "walCommit" : 70
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:12.287Z",
     "batchId" : 2,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.066Z",
     "batchId" : 2,
     "numInputRows" : 2,
     "inputRowsPerSecond" : 153.84615384615384,
     "processedRowsPerSecond" : 3.2258064516129035,
     "durationMs" : {
       "addBatch" : 482,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 50,
       "triggerExecution" : 620,
       "walCommit" : 44
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:53.500Z",
       "max" : "1970-01-01T00:02:03.000Z",
       "min" : "1970-01-01T00:01:44.000Z",
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 2,
       "numRowsUpdated" : 2,
       "memoryUsedBytes" : 2584,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 20,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 920
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 1,
       "numInputRows" : 2,
       "inputRowsPerSecond" : 153.84615384615384,
       "processedRowsPerSecond" : 3.2258064516129035
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.688Z",
     "batchId" : 3,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 987,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 43,
       "triggerExecution" : 1117,
       "walCommit" : 44
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:14.806Z",
     "batchId" : 4,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604815707
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120444/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608172028
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405174295
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -202,47 +202,68 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
         }
       }
 
-      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
+      // Pick the latest progress that actually ran a batch
+      def lastExecutedBatch: StreamingQueryProgress = {
+        query.recentProgress.filter(_.durationMs.containsKey("addBatch")).last
+      }
 
-        operatorProgress
+      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
+        lastExecutedBatch.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      // batchId 0
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
+
+      // batchId 1 without data
+      AdvanceManualClock(1000L), // watermark = 5
+      Execute { q =>             // wait for the no data batch to complete
 
 Review comment:
   yeah, it'd be nice to provide an inbuilt function for it if this pattern is used more over time

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606312400
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606194662
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083867
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607570238
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120702/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559379
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405159015
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -202,47 +202,68 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
         }
       }
 
-      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
+      // Pick the latest progress that actually ran a batch
+      def lastExecutedBatch: StreamingQueryProgress = {
+        query.recentProgress.filter(_.durationMs.containsKey("addBatch")).last
+      }
 
-        operatorProgress
+      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
+        lastExecutedBatch.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      // batchId 0
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
+
+      // batchId 1 without data
+      AdvanceManualClock(1000L), // watermark = 5
+      Execute { q =>             // wait for the no data batch to complete
 
 Review comment:
   (Good to have) It might be good to have a new operation to consolidate waiting for "no data batch" and checking the answer (as they have same pattern except the desired batch ID).
   
   Not mandatory to do it in this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700923
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607570230
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604645541
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25146/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607602162
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120701/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398888067
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -170,9 +170,8 @@ trait ProgressReporter extends Logging {
       )
     }
 
-    val sinkProgress = SinkProgress(
-      sink.toString,
-      sinkCommitProgress.map(_.numOutputRows))
+    val sinkOutput = if (hasExecuted) sinkCommitProgress.map(_.numOutputRows) else Some(0L)
 
 Review comment:
   its not clear to me why this needs to be done. MicroBatchExecution.runBatch sets this `sinkCommitProgress` variable after every batch, even if the batch is empty. So I would expect this progress to have update metrics, which would naturally be 0 for an empty batch that did not produce anything.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604767751
 
 
   I've proposed similar issue (different bug but the approach to resolve would be similar) in #25987 in Oct. 2019. It didn't get some love. Could we please revisit it as well? Thanks in advance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605603528
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25259/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608235721
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120743/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605522052
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25245/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604645533
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610579190
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120931/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606182856
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120603/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606182432
 
 
   **[Test build #120603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120603/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r404637776
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   @brkyvz here is my proposed test. 
   ```
       testStream(aggWithWatermark)(
         // batchId 0
         AddData(inputData, 15),
         StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
         CheckAnswer(), // watermark = 0
         AssertOnQuery { _.stateNodes.size === 1 },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 1 without data
         AdvanceManualClock(1000L), // watermark = 5
         Execute { q =>             // wait for the no data batch to complete
           eventually(timeout(streamingTimeout)) { assert(q.lastProgress.batchId === 1) }
         },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 2 with data
         AddData(inputData, 10, 12, 14),
         AdvanceManualClock(1000L), // watermark = 5
         CheckAnswer(),
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 3 with data
         AddData(inputData, 25),
         AdvanceManualClock(1000L), // watermark = 5
         CheckAnswer(),
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 4 without data
         AdvanceManualClock(1000L), // watermark = 15
         Execute { q =>             // wait for the no data batch to complete
           eventually(timeout(streamingTimeout)) { assert(q.lastProgress.batchId === 4) }
         },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 1 }
       )
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551672
 
 
   **[Test build #120544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120544/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607573039
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25401/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605521878
 
 
   **[Test build #120539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120539/testReport)** for PR 28040 at commit [`62044dc`](https://github.com/apache/spark/commit/62044dcb6ea4c77d075fa2284ee872a9ff056e38).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] asfgit closed pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
asfgit closed pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607571781
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399710719
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   actually no. FileStreamSink is a V1 sink and doesn't support output metrics it seems

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557679
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25399/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399130697
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
       AddData(inputData, 15),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
       AddData(inputData, 25),
+      AdvanceManualClock(1000L),
+      CheckAnswer(), // watermark = 15, but nothing yet
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L),
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
 
 Review comment:
   This confused me - this value is actually updated in previous time so the better place of this line is after L231.
   
   This makes me thinking the delay of output if the trigger interval is quite huge. It should be uncommon, but it's not impossible with cron-ed continuous execution of once trigger.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700923
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605529676
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120539/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605345388
 
 
   I'm not sure I get it. My PR fixes a different bug so while there may be conflict between twos, twos are valid by theirselves. Why not deal with this PR (or my PR) quickly and do rebase, and deal with other? I think this PR is getting close to merge.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398978932
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
 
 Review comment:
   I didn't find this code readable, therefore removed the hack here

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604701470
 
 
   **[Test build #120438 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120438/testReport)** for PR 28040 at commit [`7def4de`](https://github.com/apache/spark/commit/7def4ded7a7f2bc4796db7405bca69e42d1405f1).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398899659
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
 ##########
 @@ -241,6 +241,7 @@ class StreamingQueryStatusAndProgressSuite extends StreamTest with Eventually {
           assert(nextProgress.numInputRows === 0)
           assert(nextProgress.stateOperators.head.numRowsTotal === 2)
           assert(nextProgress.stateOperators.head.numRowsUpdated === 0)
+          assert(nextProgress.sink.numOutputRows === 0)
 
 Review comment:
   this tests the original bug you wanted to solve. but I am afraid that in the no-data-batches, the correctness of numOutputRows may not be tested in the unit tests. Can you verify that? And if not can you add a few additional checks in the relevant tests?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609109226
 
 
   **[Test build #120814 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120814/testReport)** for PR 28040 at commit [`786d921`](https://github.com/apache/spark/commit/786d921fcabef333bfa421f2fdd8255cc79ee997).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399723489
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -189,14 +195,14 @@ trait ProgressReporter extends Logging {
       sink = sinkProgress,
       observedMetrics = new java.util.HashMap(observedMetrics.asJava))
 
-    if (hasNewData) {
-      // Reset noDataEventTimestamp if we processed any data
-      lastNoDataProgressEventTime = Long.MinValue
+    if (hasExecuted) {
+      // Reset lastProgressEventTime if we processed any data
+      lastProgressEventTime = triggerClock.getTimeMillis()
       updateProgress(newProgress)
 
 Review comment:
   As we changed the semantic of the variable, I think we can update it in `updateProgress` to ensure the variable is updated whenever progress update is happening.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604815495
 
 
   **[Test build #120444 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120444/testReport)** for PR 28040 at commit [`4655611`](https://github.com/apache/spark/commit/465561182d48632ad6ea1b4e4079532a8cc5ac69).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763881
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r401991562
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -226,7 +226,7 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
       AddData(inputData, 10, 12, 14),
-      AdvanceManualClock(1000L), // watermark = 5, runs with the just added data
+      AdvanceManualClock(1000L), // watermark = 0, runs with the just added data
 
 Review comment:
   Let's just remove watermark here in comment as you've done with further `AdvanceManualClock`

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607602021
 
 
   **[Test build #120701 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120701/testReport)** for PR 28040 at commit [`d501127`](https://github.com/apache/spark/commit/d50112747aa14a17e8ebe52bf8935736fa25edf0).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606115648
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25307/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700437
 
 
   **[Test build #120442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120442/testReport)** for PR 28040 at commit [`b57aa74`](https://github.com/apache/spark/commit/b57aa7412adf91c23bfb14f9171478f89835c302).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610679788
 
 
   LGTM for this PR. @brkyvz feel free to merge it. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399130697
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
       AddData(inputData, 15),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
       AddData(inputData, 25),
+      AdvanceManualClock(1000L),
+      CheckAnswer(), // watermark = 15, but nothing yet
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L),
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
 
 Review comment:
   This confused me - this value is actually updated in previous time so the better place of this line is after L231. It just works because lastProgress is not updated here.
   
   This also makes me thinking the delay of output from no-data microbatch if the trigger interval is quite huge. It should be uncommon as the default trigger is "immediate", but it's not impossible with cron-ed continuous execution of once trigger.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604815704
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399724222
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
 
 Review comment:
   This surprises me, although it's not directly related to this PR so treat it as OFF-TOPIC.
   
   Based on the test, it sounds to me as we need to wait for next trigger interval to run no-data microbatch, and we need to run no-data microbatch even input is available. The input is handled in next trigger. 
   
   My expectation was that no-data microbatch is consolidated with data microbatch if there's input available. And ideally thinking, no data microbatch should not require to wait for next trigger interval.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610694104
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120932/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405157677
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   @tdas 
   
   > I think all the confusion is starting from the fact that you dont need to advance manual clock after StartStream to trigger the first batch.
   > Rather what it is doing is advancing the clock thus allowing the 2nd batch to be automatically triggered as soon as the first batch finishes.
   > This weird asynchronousness despite using the manual clock makes the test incomprehensible and is also a perfect recipe for flakiness.
   
   Ah, nice finding. Great analysis. That's what I've missed. The proposal looks great and provides better understanding. I have comments for new proposal but since the proposal is reflected in PR, I'll comment directly to the PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609109397
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604645541
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25146/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608172031
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25441/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604743435
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120442/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609109399
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120814/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399723354
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   That said, we may not want to overwrite the value to 0 if the value is negative - it may be odd if the value has been `-1` because the sink doesn't support numOutputRows but sometimes the value is 0.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607572692
 
 
   **[Test build #120703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120703/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405159015
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -202,47 +202,68 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
         }
       }
 
-      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
+      // Pick the latest progress that actually ran a batch
+      def lastExecutedBatch: StreamingQueryProgress = {
+        query.recentProgress.filter(_.durationMs.containsKey("addBatch")).last
+      }
 
-        operatorProgress
+      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
+        lastExecutedBatch.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      // batchId 0
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
+
+      // batchId 1 without data
+      AdvanceManualClock(1000L), // watermark = 5
+      Execute { q =>             // wait for the no data batch to complete
 
 Review comment:
   (Good to have) It might be good to have a new operation to wait for "no data batch" (as they have same pattern except the desired batch ID). Not mandatory to do it in this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607638549
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120703/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610552002
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605595626
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763885
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25152/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604743350
 
 
   **[Test build #120442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120442/testReport)** for PR 28040 at commit [`b57aa74`](https://github.com/apache/spark/commit/b57aa7412adf91c23bfb14f9171478f89835c302).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605595628
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120544/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173872
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25442/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610694096
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610579120
 
 
   **[Test build #120931 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120931/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610552002
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606312400
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604146
 
 
   **[Test build #120932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120932/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605595628
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120544/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604146
 
 
   **[Test build #120932 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120932/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25513/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606192458
 
 
   **[Test build #120608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120608/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605722265
 
 
   Oh github didn't post my comment - the minor comment is, we renamed the field due to the semantic change, and it's being reverted now even though the purpose of revert is not relevant to this. Could you rename it again? It's OK you'd like me to do it as a follow-up, or even a part of #25987.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608172028
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399094221
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   The last progress here is for "no data & no run" because of the reason I commented earlier - that's why the test fails.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r404637776
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   @brkyvz here is my proposed test. @HeartSaVioR please take a look and see whether this is more understandable.
   ```
       testStream(aggWithWatermark)(
         // batchId 0
         AddData(inputData, 15),
         StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
         CheckAnswer(), // watermark = 0
         AssertOnQuery { _.stateNodes.size === 1 },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 1 without data
         AdvanceManualClock(1000L), // watermark = 5
         Execute { q =>             // wait for the no data batch to complete
           eventually(timeout(streamingTimeout)) { assert(q.lastProgress.batchId === 1) }
         },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 2 with data
         AddData(inputData, 10, 12, 14),
         AdvanceManualClock(1000L), // watermark = 5
         CheckAnswer(),
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 3 with data
         AddData(inputData, 25),
         AdvanceManualClock(1000L), // watermark = 5
         CheckAnswer(),
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
   
         // batchId 4 without data
         AdvanceManualClock(1000L), // watermark = 15
         Execute { q =>             // wait for the no data batch to complete
           eventually(timeout(streamingTimeout)) { assert(q.lastProgress.batchId === 4) }
         },
         AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
         AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
         AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 1 }
       )
     }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398897735
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -143,7 +143,7 @@ trait ProgressReporter extends Logging {
   }
 
   /** Finalizes the query progress and adds it to list of recent status updates. */
-  protected def finishTrigger(hasNewData: Boolean): Unit = {
+  protected def finishTrigger(hasNewData: Boolean, hasExecuted: Boolean): Unit = {
 
 Review comment:
   Add scala docs to explain the difference between these two flags. This is pretty confusing, and easy mix up when passing the two booleans.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559385
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25400/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605603528
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25259/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607573039
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25401/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604701673
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120438/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605595563
 
 
   **[Test build #120544 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120544/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173872
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25442/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606312406
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120608/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604648600
 
 
   **[Test build #120438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120438/testReport)** for PR 28040 at commit [`7def4de`](https://github.com/apache/spark/commit/7def4ded7a7f2bc4796db7405bca69e42d1405f1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604767751
 
 
   I've proposed similar issue (not same issue but the approach to resolve would be similar) in #25987 in Oct. 2019. It didn't get some love. Could we please revisit it as well? Thanks in advance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610694104
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120932/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398895257
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -170,9 +170,8 @@ trait ProgressReporter extends Logging {
       )
     }
 
-    val sinkProgress = SinkProgress(
-      sink.toString,
-      sinkCommitProgress.map(_.numOutputRows))
+    val sinkOutput = if (hasExecuted) sinkCommitProgress.map(_.numOutputRows) else Some(0L)
 
 Review comment:
   but in this case, we're not calling runBatch, because there was no new data and no zero-data microbatch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605603527
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405159015
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -202,47 +202,68 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
         }
       }
 
-      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
+      // Pick the latest progress that actually ran a batch
+      def lastExecutedBatch: StreamingQueryProgress = {
+        query.recentProgress.filter(_.durationMs.containsKey("addBatch")).last
+      }
 
-        operatorProgress
+      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
+        lastExecutedBatch.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      // batchId 0
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
+
+      // batchId 1 without data
+      AdvanceManualClock(1000L), // watermark = 5
+      Execute { q =>             // wait for the no data batch to complete
 
 Review comment:
   (Good to have) It might be good to have a new operation to wait for "no data batch" and check the answer (as they have same pattern except the desired batch ID).
   
   Not mandatory to do it in this PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604736
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25624/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606115639
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz edited a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz edited a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557312
 
 
   @tdas I added 50 retries to the test to alleviate your concerns around flakiness. I'll remove the 50 retries once the tests pass

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610602033
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605521878
 
 
   **[Test build #120539 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120539/testReport)** for PR 28040 at commit [`62044dc`](https://github.com/apache/spark/commit/62044dcb6ea4c77d075fa2284ee872a9ff056e38).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605522048
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559385
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25400/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606114789
 
 
   **[Test build #120603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120603/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604736
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25624/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606182847
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399139344
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
       AddData(inputData, 15),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
       AddData(inputData, 25),
+      AdvanceManualClock(1000L),
+      CheckAnswer(), // watermark = 15, but nothing yet
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L),
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
 
 Review comment:
   I'd also add `AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },`  along with moving this line to make clear no-data microbatch "decreases" the number of total rows in state.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610682577
 
 
   Thanks @HeartSaVioR @tdas for the review. Merging to master/3.0. Let's jump on #25987 next.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763466
 
 
   **[Test build #120444 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120444/testReport)** for PR 28040 at commit [`4655611`](https://github.com/apache/spark/commit/465561182d48632ad6ea1b4e4079532a8cc5ac69).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610579190
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120931/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604701665
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607602159
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551758
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559120
 
 
   **[Test build #120702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120702/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405157677
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   > I think all the confusion is starting from the fact that you dont need to advance manual clock after StartStream to trigger the first batch.
   > Rather what it is doing is advancing the clock thus allowing the 2nd batch to be automatically triggered as soon as the first batch finishes.
   > This weird asynchronousness despite using the manual clock makes the test incomprehensible and is also a perfect recipe for flakiness.
   
   Ah, nice finding. Great analysis. That's what I've missed. The proposal looks great and provides better understanding. I have comments for new proposal but since the proposal is reflected in PR, I'll comment directly to the PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559379
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607570186
 
 
   **[Test build #120702 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120702/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398843875
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -189,7 +188,7 @@ trait ProgressReporter extends Logging {
       sink = sinkProgress,
       observedMetrics = new java.util.HashMap(observedMetrics.asJava))
 
-    if (hasNewData) {
+    if (hasExecuted) {
 
 Review comment:
   This was also incorrect for no new data micro batches

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399578018
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala
 ##########
 @@ -280,7 +280,8 @@ class StreamingDeduplicationSuite extends StateStoreMetricsTest {
         { // State should have been cleaned if flag is set, otherwise should not have been cleaned
           if (flag) assertNumStateRows(total = 1, updated = 1)
           else assertNumStateRows(total = 7, updated = 1)
-        }
+        },
+        AssertOnQuery(q => q.lastProgress.sink.numOutputRows == 0L)
 
 Review comment:
   done, added to more suites

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700932
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25150/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551758
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604701673
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120438/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608172031
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25441/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607573033
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399092630
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -189,7 +188,7 @@ trait ProgressReporter extends Logging {
       sink = sinkProgress,
       observedMetrics = new java.util.HashMap(observedMetrics.asJava))
 
-    if (hasNewData) {
+    if (hasExecuted) {
 
 Review comment:
   Nice finding. We don't recognize the bug because lastNoDataProgressEventTime is set to Long.MinValue which makes next no new data micro batch to update the progress immediately, which hides the bug. (If that's intentional, well, then it's too tricky and we should have commented here.)
   
   Maybe we should also rename lastNoDataProgressEventTime as well as the fix changes the semantic?
   
   And we may want to revisit that our intention is updating progress immediately whenever the batch has not run after any batch run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605630026
 
 
   **[Test build #120553 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120553/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607602162
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120701/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557679
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25399/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173496
 
 
   **[Test build #120743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120743/testReport)** for PR 28040 at commit [`2c9ed55`](https://github.com/apache/spark/commit/2c9ed55045fd5ff685f651f1e9799fb94dff497a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551761
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25250/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605529676
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120539/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083751
 
 
   **[Test build #120814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120814/testReport)** for PR 28040 at commit [`786d921`](https://github.com/apache/spark/commit/786d921fcabef333bfa421f2fdd8255cc79ee997).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r405157677
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   @tdas 
   
   > I think all the confusion is starting from the fact that you dont need to advance manual clock after StartStream to trigger the first batch.
   > Rather what it is doing is advancing the clock thus allowing the 2nd batch to be automatically triggered as soon as the first batch finishes.
   > This weird asynchronousness despite using the manual clock makes the test incomprehensible and is also a perfect recipe for flakiness.
   
   Ah, nice finding. Great analysis. That's what I've missed (and very confusing behavior TBH). The proposal looks great and provides better understanding. I have comments for new proposal but since the proposal is reflected in PR, I'll comment directly to the PR.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606182847
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606115639
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605604131
 
 
   **[Test build #120553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120553/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605529672
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700437
 
 
   **[Test build #120442 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120442/testReport)** for PR 28040 at commit [`b57aa74`](https://github.com/apache/spark/commit/b57aa7412adf91c23bfb14f9171478f89835c302).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763881
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173864
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604700932
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25150/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610693393
 
 
   **[Test build #120932 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120932/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557673
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606115648
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25307/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605630257
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120553/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610694096
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606311849
 
 
   **[Test build #120608 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120608/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r401100955
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
 
 Review comment:
   I agree. I think this is because of manual clock synchronization and the next batch is already planned before the data appears to be there

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604815707
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120444/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606182856
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120603/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610552008
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25623/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557673
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605603420
 
 
   Retest this, please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605630255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608235721
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120743/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610554898
 
 
   **[Test build #120931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120931/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604648600
 
 
   **[Test build #120438 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120438/testReport)** for PR 28040 at commit [`7def4de`](https://github.com/apache/spark/commit/7def4ded7a7f2bc4796db7405bca69e42d1405f1).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607602159
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604743430
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606312406
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120608/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399130697
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,41 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
       AddData(inputData, 15),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L),
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
       AddData(inputData, 25),
+      AdvanceManualClock(1000L),
+      CheckAnswer(), // watermark = 15, but nothing yet
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L),
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
 
 Review comment:
   This confused me - this value is actually updated in previous time so the better place of this line is after L231. It just works because lastProgress is not updated here.
   
   This also makes me thinking the delay of output from no-data microbatch if the trigger interval is quite huge. It should be uncommon, but it's not impossible with cron-ed continuous execution of once trigger.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605603527
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399723128
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   The value is -1 instead of 0 if it doesn't support output metrics, and as you can see the error message in build, here the value is 0 instead of -1, because the patch overwrites the value to 0 when the batch hasn't run. So yes the last progress here is for "no data & no run", though the new commit should fix this problem.
   
   > V1 suite
   
   ```
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:08.567Z",
     "batchId" : 0,
     "numInputRows" : 1,
     "inputRowsPerSecond" : 83.33333333333333,
     "processedRowsPerSecond" : 0.3835826620636747,
     "durationMs" : {
       "addBatch" : 2055,
       "getBatch" : 2,
       "latestOffset" : 0,
       "queryPlanning" : 449,
       "triggerExecution" : 2607,
       "walCommit" : 49
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:40.000Z",
       "max" : "1970-01-01T00:01:40.000Z",
       "min" : "1970-01-01T00:01:40.000Z",
       "watermark" : "1970-01-01T00:00:00.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 1,
       "memoryUsedBytes" : 1400,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 0,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 680
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : null,
       "endOffset" : 0,
       "numInputRows" : 1,
       "inputRowsPerSecond" : 83.33333333333333,
       "processedRowsPerSecond" : 0.3835826620636747
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:11.185Z",
     "batchId" : 1,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 935,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 52,
       "triggerExecution" : 1101,
       "walCommit" : 70
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:12.287Z",
     "batchId" : 2,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.066Z",
     "batchId" : 2,
     "numInputRows" : 2,
     "inputRowsPerSecond" : 153.84615384615384,
     "processedRowsPerSecond" : 3.2258064516129035,
     "durationMs" : {
       "addBatch" : 482,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 50,
       "triggerExecution" : 620,
       "walCommit" : 44
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:53.500Z",
       "max" : "1970-01-01T00:02:03.000Z",
       "min" : "1970-01-01T00:01:44.000Z",
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 2,
       "numRowsUpdated" : 2,
       "memoryUsedBytes" : 2584,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 20,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 920
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 1,
       "numInputRows" : 2,
       "inputRowsPerSecond" : 153.84615384615384,
       "processedRowsPerSecond" : 3.2258064516129035
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.688Z",
     "batchId" : 3,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 987,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 43,
       "triggerExecution" : 1117,
       "walCommit" : 44
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:14.806Z",
     "batchId" : 4,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   ```
   
   > V2 suite
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083751
 
 
   **[Test build #120814 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120814/testReport)** for PR 28040 at commit [`786d921`](https://github.com/apache/spark/commit/786d921fcabef333bfa421f2fdd8255cc79ee997).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606191057
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399723128
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala
 ##########
 @@ -253,9 +253,11 @@ abstract class FileStreamSinkSuite extends StreamTest {
 
       addTimestamp(104, 123) // watermark = 90 before this, watermark = 123 - 10 = 113 after this
       check((100L, 105L) -> 2L)  // no-data-batch emits results on 100-105,
+      assert(query.lastProgress.sink.numOutputRows === 1)
 
 Review comment:
   The value is -1 instead of 0 if it doesn't support output metrics, and as you can see the error message in build, here the value is 0 instead of -1, because the patch overwrites the value to 0 when the batch hasn't run. So yes the last progress here is for "no data & no run", though the new commit should fix this problem.
   
   > V1 suite
   
   ```
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:08.567Z",
     "batchId" : 0,
     "numInputRows" : 1,
     "inputRowsPerSecond" : 83.33333333333333,
     "processedRowsPerSecond" : 0.3835826620636747,
     "durationMs" : {
       "addBatch" : 2055,
       "getBatch" : 2,
       "latestOffset" : 0,
       "queryPlanning" : 449,
       "triggerExecution" : 2607,
       "walCommit" : 49
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:40.000Z",
       "max" : "1970-01-01T00:01:40.000Z",
       "min" : "1970-01-01T00:01:40.000Z",
       "watermark" : "1970-01-01T00:00:00.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 1,
       "memoryUsedBytes" : 1400,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 0,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 680
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : null,
       "endOffset" : 0,
       "numInputRows" : 1,
       "inputRowsPerSecond" : 83.33333333333333,
       "processedRowsPerSecond" : 0.3835826620636747
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:11.185Z",
     "batchId" : 1,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 935,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 52,
       "triggerExecution" : 1101,
       "walCommit" : 70
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:12.287Z",
     "batchId" : 2,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2272,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 10,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 0,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.066Z",
     "batchId" : 2,
     "numInputRows" : 2,
     "inputRowsPerSecond" : 153.84615384615384,
     "processedRowsPerSecond" : 3.2258064516129035,
     "durationMs" : {
       "addBatch" : 482,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 50,
       "triggerExecution" : 620,
       "walCommit" : 44
     },
     "eventTime" : {
       "avg" : "1970-01-01T00:01:53.500Z",
       "max" : "1970-01-01T00:02:03.000Z",
       "min" : "1970-01-01T00:01:44.000Z",
       "watermark" : "1970-01-01T00:01:30.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 2,
       "numRowsUpdated" : 2,
       "memoryUsedBytes" : 2584,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 20,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 920
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 0,
       "endOffset" : 1,
       "numInputRows" : 2,
       "inputRowsPerSecond" : 153.84615384615384,
       "processedRowsPerSecond" : 3.2258064516129035
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:13.688Z",
     "batchId" : 3,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "addBatch" : 987,
       "getBatch" : 0,
       "latestOffset" : 0,
       "queryPlanning" : 43,
       "triggerExecution" : 1117,
       "walCommit" : 44
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : -1
     }
   }
   {
     "id" : "1bbf91ac-0a24-4da7-bfe8-54ce1ac63e0f",
     "runId" : "ac738c58-a976-4c87-9547-b4a1ee1c2560",
     "name" : null,
     "timestamp" : "2020-03-28T23:33:14.806Z",
     "batchId" : 4,
     "numInputRows" : 0,
     "inputRowsPerSecond" : 0.0,
     "processedRowsPerSecond" : 0.0,
     "durationMs" : {
       "latestOffset" : 0,
       "triggerExecution" : 0
     },
     "eventTime" : {
       "watermark" : "1970-01-01T00:01:53.000Z"
     },
     "stateOperators" : [ {
       "numRowsTotal" : 1,
       "numRowsUpdated" : 0,
       "memoryUsedBytes" : 2512,
       "customMetrics" : {
         "loadedMapCacheHitCount" : 30,
         "loadedMapCacheMissCount" : 0,
         "stateOnCurrentVersionSizeBytes" : 720
       }
     } ],
     "sources" : [ {
       "description" : "MemoryStream[value#1L]",
       "startOffset" : 1,
       "endOffset" : 1,
       "numInputRows" : 0,
       "inputRowsPerSecond" : 0.0,
       "processedRowsPerSecond" : 0.0
     } ],
     "sink" : {
       "description" : "FileSink[/private/var/folders/wn/3hpqx8015hjbmq43hmrw78z40000gn/T/stream.output-cf800c40-1e18-405e-b48f-71a08348a298]",
       "numOutputRows" : 0
     }
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604815704
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r404608550
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -202,47 +202,55 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
         }
       }
 
-      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
+      // Pick the latest progress that actually ran a batch
+      def lastExecutedBatch: StreamingQueryProgress = {
+        query.recentProgress.filter(_.durationMs.containsKey("addBatch")).last
+      }
 
-        operatorProgress
+      def stateOperatorProgresses: Seq[StateOperatorProgress] = {
+        lastExecutedBatch.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 0, runs with the just added data
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastExecutedBatch.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
+      CheckAnswer(), // watermark = 15, but nothing yet
 
 Review comment:
   I feel like this will flaky. CheckAnswer() works reliably only when there is new data to process because it waits for the new data's offset to be reported as processed. Here there is no new data in the no-data-batch, so its possible that this CheckAnswer wont wait for the no-data-batch to finish before starting the last progress checks.
   
   Instead its more reliable (probably) to use eventually, where you check that the lastprogress has the expected batchId.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604730
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607559120
 
 
   **[Test build #120702 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120702/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607638539
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605529651
 
 
   **[Test build #120539 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120539/testReport)** for PR 28040 at commit [`62044dc`](https://github.com/apache/spark/commit/62044dcb6ea4c77d075fa2284ee872a9ff056e38).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604645533
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r404625165
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   This is not off-topic actually because (i) not understanding this correctly can lead to flaky tests, and (ii) I was afraid that fixes made in this PR actually changed the semantic behavior of no data batches. But that is not the case. I tested in this unit test myself. I think all the confusion is starting from the fact that you dont need to advance manual clock after StartStream to trigger the first batch. So the first `AdvanceManualClock` not really necessary. Rather what it is doing is advancing the clock thus allowing the 2nd batch to be automatically triggered as soon as the first batch finishes. This is what is leading to the confusion on why is the second batch not picking up the new data ... that is because the next batch has been unblocked already (i.e., before `AddData(10, 12, 14)`) with the first `AdvancedManualClock`. This weird asynchronousness despite using the manual clock makes the test incomprehensible and is also a perfect recipe for flakiness.
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083311
 
 
   https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120743/testReport/ passed without flakiness

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609109399
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120814/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605536655
 
 
   Looks like the change brought side-effect and build failure is related. Could you please fix these tests as well?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608235152
 
 
   **[Test build #120743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120743/testReport)** for PR 28040 at commit [`2c9ed55`](https://github.com/apache/spark/commit/2c9ed55045fd5ff685f651f1e9799fb94dff497a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607637753
 
 
   **[Test build #120703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120703/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557366
 
 
   **[Test build #120701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120701/testReport)** for PR 28040 at commit [`d501127`](https://github.com/apache/spark/commit/d50112747aa14a17e8ebe52bf8935736fa25edf0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605630255
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557366
 
 
   **[Test build #120701 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120701/testReport)** for PR 28040 at commit [`d501127`](https://github.com/apache/spark/commit/d50112747aa14a17e8ebe52bf8935736fa25edf0).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605529672
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604763885
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25152/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083867
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173496
 
 
   **[Test build #120743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120743/testReport)** for PR 28040 at commit [`2c9ed55`](https://github.com/apache/spark/commit/2c9ed55045fd5ff685f651f1e9799fb94dff497a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607570230
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610604730
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607557312
 
 
   @tdas I added 50 retries to the test to alleviate your concerns around flakiness.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399092630
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -189,7 +188,7 @@ trait ProgressReporter extends Logging {
       sink = sinkProgress,
       observedMetrics = new java.util.HashMap(observedMetrics.asJava))
 
-    if (hasNewData) {
+    if (hasExecuted) {
 
 Review comment:
   Nice finding. We haven't recognized the bug because lastNoDataProgressEventTime is set to Long.MinValue which makes next no new data micro batch to update the progress immediately, which hides the bug. (If that's intentional, well, then it's too tricky and we should have commented here.)
   
   Maybe we should also rename lastNoDataProgressEventTime as well as the fix changes the semantic?
   
   And we may want to revisit that our intention is updating progress immediately whenever the batch has not run after any batch run.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608235712
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551531
 
 
   I reverted the changes with respect to the empty progress. Seemed to be a bit more risky than I'd like, as I'd like to warmfix this into Spark 3.0

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606192458
 
 
   **[Test build #120608 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120608/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609109397
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399724883
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   This is also something hard to understand (requires two trigger intervals instead of one) but yes this is OFF-TOPIC.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605604131
 
 
   **[Test build #120553 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120553/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-609083870
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25513/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606194695
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25312/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399724883
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingAggregationSuite.scala
 ##########
 @@ -203,46 +203,53 @@ class StreamingAggregationSuite extends StateStoreMetricsTest with Assertions {
       }
 
       def stateOperatorProgresses: Seq[StateOperatorProgress] = {
-        val operatorProgress = mutable.ArrayBuffer[StateOperatorProgress]()
-        var progress = query.recentProgress.last
-
-        operatorProgress ++= progress.stateOperators.map { op => op.copy(op.numRowsUpdated) }
-        if (progress.numInputRows == 0) {
-          // empty batch, merge metrics from previous batch as well
-          progress = query.recentProgress.takeRight(2).head
-          operatorProgress.zipWithIndex.foreach { case (sop, index) =>
-            // "numRowsUpdated" should be merged, as it could be updated in both batches.
-            // (for now it is only updated from previous batch, but things can be changed.)
-            // other metrics represent current status of state so picking up the latest values.
-            val newOperatorProgress = sop.copy(
-              sop.numRowsUpdated + progress.stateOperators(index).numRowsUpdated)
-            operatorProgress(index) = newOperatorProgress
-          }
-        }
-
-        operatorProgress
+        query.recentProgress.last.stateOperators
       }
     }
 
+    val clock = new StreamManualClock()
+
     testStream(aggWithWatermark)(
       AddData(inputData, 15),
-      CheckAnswer(), // watermark = 5
+      StartStream(Trigger.ProcessingTime("interval 1 second"), clock),
+      AdvanceManualClock(1000L), // triggers first batch
+      CheckAnswer(), // watermark = 0
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
       AddData(inputData, 10, 12, 14),
+      AdvanceManualClock(1000L), // watermark = 5, runs no-data microbatch
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 1 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs with new data from above
       CheckAnswer(), // watermark = 5
       AssertOnQuery { _.stateNodes.size === 1 },
       AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
       AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 2 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
       AddData(inputData, 25),
-      CheckAnswer((10, 3)), // watermark = 15
+      AdvanceManualClock(1000L), // actually runs batch with data
+      CheckAnswer(), // watermark = 5, will update to 15 next batch
       AssertOnQuery { _.stateNodes.size === 1 },
-      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 1 },
+      AssertOnQuery { _.stateNodes.head.metrics("numOutputRows").value === 0 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsUpdated === 1 },
+      AssertOnQuery { _.stateOperatorProgresses.head.numRowsTotal === 3 },
+      AssertOnQuery { _.lastProgress.sink.numOutputRows == 0 },
+      AdvanceManualClock(1000L), // runs batch with no new data and watermark progresses
 
 Review comment:
   This is also something hard to understand (requires two trigger intervals instead of one - ideally zero - to run no-data microbatch) but yes this is OFF-TOPIC.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551761
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25250/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608235712
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605630257
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120553/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610554898
 
 
   **[Test build #120931 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120931/testReport)** for PR 28040 at commit [`68bf147`](https://github.com/apache/spark/commit/68bf14793b950f2a71f9cc61005f991577fc3774).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610579180
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604743430
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605522052
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25245/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607573033
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605146352
 
 
   @HeartSaVioR Thanks for bringing that PR to my attention. We should get that in as well! Would you like to take over both?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607572692
 
 
   **[Test build #120703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120703/testReport)** for PR 28040 at commit [`68848d9`](https://github.com/apache/spark/commit/68848d993a7a2fc7c7b860d4129bf27147c57a81).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607570238
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120702/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604743435
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120442/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
tdas commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r398916867
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingDeduplicationSuite.scala
 ##########
 @@ -280,7 +280,8 @@ class StreamingDeduplicationSuite extends StateStoreMetricsTest {
         { // State should have been cleaned if flag is set, otherwise should not have been cleaned
           if (flag) assertNumStateRows(total = 1, updated = 1)
           else assertNumStateRows(total = 7, updated = 1)
-        }
+        },
+        AssertOnQuery(q => q.lastProgress.sink.numOutputRows == 0L)
 
 Review comment:
   arent there other tests for the no-data flag in other stateful query suites? basically, there is no test till now that is testing whether # output rows generated by no-data-batches are computed correctly. Here is there are no rows that are output, so cant say whether this 0 is some sort of default 0 or computed correctly to be 0. And streaming dedup wont be non-zero in no-data-batches. So maybe try streaming aggs in append mode?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on a change in pull request #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#discussion_r399577927
 
 

 ##########
 File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/ProgressReporter.scala
 ##########
 @@ -189,7 +188,7 @@ trait ProgressReporter extends Logging {
       sink = sinkProgress,
       observedMetrics = new java.util.HashMap(observedMetrics.asJava))
 
-    if (hasNewData) {
+    if (hasExecuted) {
 
 Review comment:
   oh, that's why I'm facing issues... I understand better now

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605551672
 
 
   **[Test build #120544 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120544/testReport)** for PR 28040 at commit [`5fbbf41`](https://github.com/apache/spark/commit/5fbbf4137413ad359f70085c9692a496e53c4cf6).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
brkyvz commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610601995
 
 
   Unrelated flaky test: org.apache.spark.sql.hive.thriftserver.CliSuite.SPARK-11188 Analysis error reporting

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-608173864
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-605595626
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607607804
 
 
   Looks like the flaky test is below (and this PR made it flaky), not in streaming aggregation.
   
   ```
   org.apache.spark.sql.kafka010.KafkaSinkMicroBatchStreamingSuite.streaming - sink progress is produced
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610579180
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607638549
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120703/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606194662
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-610552008
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25623/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-606114789
 
 
   **[Test build #120603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120603/testReport)** for PR 28040 at commit [`d7792fd`](https://github.com/apache/spark/commit/d7792fda48c9b0a7192818660be0b0ecdf3f32fa).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-604701665
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28040: [DO NOT MERGE][SPARK-31278][SS] Fix StreamingQuery output rows metric
URL: https://github.com/apache/spark/pull/28040#issuecomment-607638539
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org