You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/11/19 12:57:17 UTC

[GitHub] [spark] HeartSaVioR opened a new pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

HeartSaVioR opened a new pull request #30427:
URL: https://github.com/apache/spark/pull/30427


   
   ### What changes were proposed in this pull request?
   
   This PR proposes to add the watermark gap information in SS UI page. Please refer below screenshots to see what we'd like to show in UI.
   
   ![Screen Shot 2020-11-19 at 6 56 47 PM](https://user-images.githubusercontent.com/1317309/99669029-d5d4c080-2ab1-11eb-9c63-d05b3e1ab391.png)
   ![Screen Shot 2020-11-19 at 7 00 21 PM](https://user-images.githubusercontent.com/1317309/99669049-dbcaa180-2ab1-11eb-8789-10b35857dda0.png)
   
   Please note that this PR doesn't plot the watermark value - knowing the gap between actual wall clock and watermark looks more useful than the absolute value.
   
   ### Why are the changes needed?
   
   Watermark is the one of major metrics the end users need to track for stateful queries. Watermark defines "when" the output will be emitted for append mode, hence knowing how much gap between wall clock and watermark (input data) is very helpful to make expectation of the output.
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, SS UI query page will contain the watermark gap information.
   
   ### How was this patch tested?
   
   Basic UT added. Manually tested with two queries:
   
   > simple case
   
   You'll see consistent watermark gap with (15 seconds + a) = 10 seconds are from delay in watermark definition, 5 seconds are trigger interval.
   
   ```
   import org.apache.spark.sql.streaming.Trigger
   
   spark.conf.set("spark.sql.shuffle.partitions", "10")
   
   val query = spark
     .readStream
     .format("rate")
     .option("rowsPerSecond", 1000)
     .option("rampUpTime", "10s")
     .load()
     .selectExpr("timestamp", "mod(value, 100) as mod", "value")
     .withWatermark("timestamp", "10 seconds")
     .groupBy(window($"timestamp", "1 minute", "10 seconds"), $"mod")
     .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value"))
     .writeStream
     .format("console")
     .trigger(Trigger.ProcessingTime("5 seconds"))
     .outputMode("append")
     .start()
   
   query.awaitTermination()
   ```
   
   > complicated case
   
   This randomizes the timestamp, hence producing random watermark gap. This won't be smaller than 15 seconds as I described earlier.
   
   ```
   import org.apache.spark.sql.streaming.Trigger
   
   spark.conf.set("spark.sql.shuffle.partitions", "10")
   
   val query = spark
     .readStream
     .format("rate")
     .option("rowsPerSecond", 1000)
     .option("rampUpTime", "10s")
     .load()
     .selectExpr("timestamp", "mod(value, 100) as mod", "value")
     .withWatermark("timestamp", "10 seconds")
     .groupBy(window($"timestamp", "1 minute", "10 seconds"), $"mod")
     .agg(max("value").as("max_value"), min("value").as("min_value"), avg("value").as("avg_value"))
     .writeStream
     .format("console")
     .trigger(Trigger.ProcessingTime("5 seconds"))
     .outputMode("append")
     .start()
   
   query.awaitTermination()
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730864569


   **[Test build #131380 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131380/testReport)** for PR 30427 at commit [`2f1081a`](https://github.com/apache/spark/commit/2f1081a4490e62c86e80740ef9a5f0645b78fd2c).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527267476



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Yeah I simply copied and pasted, and struggled why it requires multiple nodes (hence `&+`). My bad.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732365550


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732827618


   **[Test build #131649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131649/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730475567


   I've double checked the graphs manually and it works fine.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731165797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731034235






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732923488






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528895988



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Thanks @xuanyuanking for raising this discussion.
   
   > OK I took too many efforts to write the comment and wrote too late. In short, if we can pick the second metric to compare with global watermark, it should be min instead of max. If Spark also picks min event time to construct watermark, we should pick max to see how much the output is lagging due to slow watermark advance.
   
   I agree. Actually my first thought is to use min event time instead of batch time in this graph.
   
   I think a ideal approach should be able to select different base time for constructing this graph, e.g. min event time or batch time. I am not sure if current UI component supports this kind of feature. But for current change, I think it should be good enough for use cases except for event time is far from clock time. That is why I gave +1 for this PR.
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528546502



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Fully agree with `knowing the gap between actual wall clock and watermark looks more useful than the absolute value.` and thanks for the super useful watermark info!
   
   I only have one concern same with @viirya https://github.com/apache/spark/pull/30427#issuecomment-730852396. Maybe we can address both scenarios(event time is/isn't close to clock time) by using the max event time received in this batch? I mean:
   ```
            if (watermarkValue > 0L) {
              // seconds
   -          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
   +          val maxEventTime = parseProgressTimestamp(p.eventTime.get("max"))
   +          Some((batchTimestamp, (maxEventTime - watermarkValue) / 1000.0))
            } else {
              None
            }
   ```
   
   Of cause this proposal changes the meaning of this chart, it represents `The gap between the latest event 
    and global watermark for the batch`. And we might need to add more explanation here since the number will be negative when all the data in the current batch is late than the current watermark(it can be reproduced by the complex demo provided). WDYT? @HeartSaVioR @viirya 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732367630


   **[Test build #131571 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131571/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730550847






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732695791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528936430



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Yeah it should be great if someone who is familiar with the FE volunteers to revise the graphs. I actually think other existing graphs are also not that ideal. (in point of auto-scale, multiple lines plotting, better unit & value & tooltip)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730721036


   cc @xuanyuanking too FYI


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733277464


   Just rebased. I'll merge either Github Action or Jenkins is happy with the change.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730726341


   Thank you for the confirmation, @gaborgsomogyi and @HeartSaVioR !


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731028233


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730549134


   **[Test build #131348 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131348/testReport)** for PR 30427 at commit [`d82702a`](https://github.com/apache/spark/commit/d82702af591ebb1d10fa49fcc50d81373debaafd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732678901






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730834615


   > Watermark is the one of major metrics the end users need to track for stateful queries. Watermark defines "when" the output will be emitted for append mode, hence knowing how much gap between wall clock and watermark (input data) is very helpful to make expectation of the output.
   
   Hmm, my question is, watermark should be derived from event time instead of processing time (I think it should be wall clock here?). In the examples, looks like the event time is as processing time, IIUC. So once the event time from data is different processing time, is this graph still useful?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731056832


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36015/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731008057






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730730024


   cc. @tdas @zsxwing @jose-torres @sarutak as well


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730550847






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733130176


   Jenkins seems unstable. GA was passed actually. I think it should be okay.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528586781



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       More correctly, how we define the "processing time" in the graph in https://github.com/apache/spark/pull/30427#issuecomment-730844687. (y axis) 
   
   The query which pulls recent events are expected to have processing time as wall clock. That is only broken when we deal with historical data - that's not having the "ideal" processing time. One of approaches which can rough guess would be tracking event time, but given Spark takes max event time to calculate watermark (while other engines take min event time) the gap is more likely pretty much similar across batches.
   
   In historical case, as well as real time case (as Spark picks max event time), tracking the gap between global watermark and min event time would be more helpful, as we can at least see whether the watermark delay is enough to cover the min event time of the next batch. This is pretty specific to Spark's case, though.
   
   (So likewise I said, there're several useful lines to plot which can be compared between and produce the meaning. I just don't take the step to go my life for frontend engineer.)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730855993


   > If we process history data or some simulation data, the event time could be far different to processing time. For example, if we process some data from 2010 to 2019, now the gap is current time - 2010-xx-xx...?
   
   You understand it correctly, though that's just a one of use cases. Given they are running "streaming workload", one of the main goals is to capture the recent outputs (e.g. trends). Watermark would still work for such historical use cases as well, but what to plot to provide values even on the situation remains the question. (What would be the "ideal" timestamp to calculate the gap in this case?)
   
   EDIT: for that case, adjusting range on y axis would probably help, otherwise we only see the "line" plotted nearly linear like what I commented above in https://github.com/apache/spark/pull/30427#issuecomment-730701075.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732823908






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528579332



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       `we can represent various aspect of views if we plot all points`
   Yeah agree, if a multi-line chart is available here might be helpful here! :)
   
   Actually, this idea came when I was thinking about the `ideal line` for the historical streaming data that should be using event time to represent processing time, not current clock time.
   
   Anyway, just want to post a different explanation of the `watermark gap` here, the current changes LGTM. If others think it's worth having another event-based gap maybe we can do it at another timeline.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733037213






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732653733






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732685418


   **[Test build #131622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131622/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730402325






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r526907069



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Now sure what complications it would mean in `generateStatTable` but I think we can give back `Node` here.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2
+      val graphUIDataForWatermark =
+        new GraphUIData(
+          "watermark-gap-timeline",
+          "watermark-gap-histogram",
+          watermarkData,
+          minBatchTime,
+          maxBatchTime,
+          0,
+          maxWatermark,
+          "seconds")
+      graphUIDataForWatermark.generateDataJs(jsCollector)
+
+      // scalastyle:off
+      new NodeBuffer() &+
+        <tr>
+          <td style="vertical-align: middle;">
+            <div style="width: 160px;">
+              <div><strong>Global Watermark Gap {SparkUIUtils.tooltip("The gap between timestamp and global watermark for the batch.", "right")}</strong></div>

Review comment:
       I understand that `timestamp` here means now but maybe we can more explicit.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/UISeleniumSuite.scala
##########
@@ -51,6 +53,7 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser with Matchers with B
     val conf = new SparkConf()
       .setMaster(master)
       .setAppName("ui-test")
+      .set(SHUFFLE_PARTITIONS, 5)

Review comment:
       Just curious, is this to speed up the unit test not to start 200 tasks?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730844687


   The `complicated case` in manual test demonstrates the use case of "event time processing". Please take a look at the code how I randomize the event timestamp in input rows.
   
   Technically, the graph is almost meaningless on processing time, because the event timestamp would be nearly same as batch timestamp. Even the query is lagging, once the next batch is launched, the event timestamp of inputs will be matched to the batch timestamp.
   
   The graph will be helpful if they're either using "ingest time" (not timestamped by Spark, but timestamped when ingested to the input storage) which could show the lag of process, or using "event time" which is the best case of showing the the gap.
   
   If you haven't read below articles, strongly recommend to read them, or read the book "Streaming Systems".
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733414031






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173017






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733283254


   **[Test build #131704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131704/testReport)** for PR 30427 at commit [`a6db726`](https://github.com/apache/spark/commit/a6db726c10ba999077a03d90231c8224d2a4a621).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527480459



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,53 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2

Review comment:
       Nice catch! It looks to be broken while filtering out 0L. Will fix.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730383757


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35952/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730792596


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35983/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527490802



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,53 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2
+      val graphUIDataForWatermark =
+        new GraphUIData(
+          "watermark-gap-timeline",
+          "watermark-gap-histogram",
+          watermarkData,
+          minBatchTime,
+          maxBatchTime,
+          0,
+          maxWatermark,
+          "seconds")
+      graphUIDataForWatermark.generateDataJs(jsCollector)
+
+      // scalastyle:off
+      <tr>
+        <td style="vertical-align: middle;">
+          <div style="width: 160px;">
+            <div><strong>Global Watermark Gap {SparkUIUtils.tooltip("The gap between batch timestamp and global watermark for the batch.", "right")}</strong></div>
+          </div>
+        </td>
+        <td class="watermark-gap-timeline">{graphUIDataForWatermark.generateTimelineHtml(jsCollector)}</td>
+        <td class="watermark-gap-timeline">{graphUIDataForWatermark.generateHistogramHtml(jsCollector)}</td>

Review comment:
       watermark-gap-histogram?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730855993


   > If we process history data or some simulation data, the event time could be far different to processing time. For example, if we process some data from 2010 to 2019, now the gap is current time - 2010-xx-xx...?
   
   You understand it correctly, though that's just a one of use cases. Given they are running "streaming workload", one of the main goals is to capture the recent outputs (e.g. trends). Watermark would still work for such use cases as well, but what to plot to provide values even on the situation remains the question. (What would be the "ideal" timestamp to calculate the gap in this case?)
   
   EDIT: for that case, adjusting range on y axis would probably help, otherwise we only see the "line" plotted nearly linear like what I commented above.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733466002






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528545469



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Now I see and agree doesn't worth the hassle.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730844687


   The `complicated case` in manual test demonstrates the use case of "event time processing". Please take a look at the code how I randomize the event timestamp in input rows.
   
   Technically, the graph is almost meaningless on processing time, because the event timestamp would be nearly same as batch timestamp. Even the query is lagging, once the next batch is launched, the event timestamp of inputs will be matched to the batch timestamp.
   
   The graph will be helpful if they're either using "ingest time" (not timestamped by Spark, but timestamped when ingested to the input storage) which could show the lag of process, or using "event time" which is the best case of showing the gap.
   
   ![Figure_05_-_Event_Time_vs_Processing_Time](https://user-images.githubusercontent.com/1317309/99758506-3a852f00-2b35-11eb-9f40-5f7c5aba7ec2.png)
   
   The graph is borrowed from the gold articles below. If you haven't read below articles, strongly recommend to read them, or read the book "Streaming Systems".
   
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730966978


   **[Test build #131405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131405/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731019379


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36010/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731023576


   retest this, please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528588455



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       OK I took too many efforts to write the comment and wrote too late. In short, if we can pick the second metric to compare with global watermark, it should be min instead of max.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732725370


   **[Test build #131630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131630/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733449070






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732711022


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730799688


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732646755


   **[Test build #131596 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131596/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528568048



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Probably we can represent various aspect of views if we plot all points (wall clock, max event time, global watermark, etc. like min event time). As I'm not a FE and don't like to jump in and hack the code around graph, I just stick with one line and decide to plot the line for the gap between wall clock and global watermark.
   
   If someone is interested to do some experiment with the graph, plotting all lines and finding relations and deciding lines to leave would be valuable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730799693


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35983/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528553014



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       I'm sorry, but when you pick the max event time, you're discarding the gap between wall clock and event time. The intention of showing watermark gap is showing the "gap" between "the event time" of events (which finally produces watermark) and "wall clock".
   
   Would this answer your comment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732695791






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732408198


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36172/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732653733






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732725370


   **[Test build #131630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131630/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730701075


   AFAIK we don't support auto scale for other graphs. That sounds like an improvement, but I'm not a FE engineer and we even don't seem to rely on graph library (which may provide rich functionalities) and implement our own, hence harder to make an improvement.
   
   The value can goes up very high if you set the additional delay of watermark to a couple of hours or even more (2 hours = 172,800 seconds). While scaling unit would make us confused, adjusting min/max of y axis might be helpful. I'm just hesitate to make change as I see all existing graphs have 0 as min value of y axis, though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732420393


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36172/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528553014



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       I'm sorry, but when you pick the max even time, you're discarding the gap between wall clock and event time. The intention of showing watermark gap is showing the "gap" between "the event time" of events and "wall clock".
   
   Would this answer your comment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731073384


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36015/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173017






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730731852


   **[Test build #131380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131380/testReport)** for PR 30427 at commit [`2f1081a`](https://github.com/apache/spark/commit/2f1081a4490e62c86e80740ef9a5f0645b78fd2c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731028233






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730362340


   One thing to think about: should we automatically scale the unit for watermark gap? I just picked seconds which doesn't look too small and too big, but if input event time is delayed by hours it's going to be a bit huge. (It's definitely not a good signal, though.)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730799688






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733465981






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731073407






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733451967


   Thanks all for reviewing! Merged to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731164598


   **[Test build #131409 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131409/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528568048



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Probably we can represent various aspect of views if we plot all points (wall clock, max event time, global watermark, etc. like min event time). As I'm not a FE and don't like to jump in and hack the code around graph, I just stick with one line and decide to plot the line for the gap between wall clock and global watermark.
   
   If someone is interested to do some experiment with the graph, plotting all lines and finding relations and deciding lines to keep would be valuable.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733037213






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733156769






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730731852


   **[Test build #131380 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131380/testReport)** for PR 30427 at commit [`2f1081a`](https://github.com/apache/spark/commit/2f1081a4490e62c86e80740ef9a5f0645b78fd2c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730865527






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527267919



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2
+      val graphUIDataForWatermark =
+        new GraphUIData(
+          "watermark-gap-timeline",
+          "watermark-gap-histogram",
+          watermarkData,
+          minBatchTime,
+          maxBatchTime,
+          0,
+          maxWatermark,
+          "seconds")
+      graphUIDataForWatermark.generateDataJs(jsCollector)
+
+      // scalastyle:off
+      new NodeBuffer() &+
+        <tr>
+          <td style="vertical-align: middle;">
+            <div style="width: 160px;">
+              <div><strong>Global Watermark Gap {SparkUIUtils.tooltip("The gap between timestamp and global watermark for the batch.", "right")}</strong></div>

Review comment:
       Yes. Probably better to say `batch timestamp` explicitly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732823908






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730701075


   AFAIK we don't support auto scale for other graphs. That sounds like an improvement, but I'm not a FE engineer and we even don't seem to rely on graph library (which may provide rich functionalities) and implement our own, hence harder to make an improvement.
   
   The value can goes up very high if you set the additional delay of watermark to a couple of hours or even more (2 hours = 172,800 seconds). If the difference of watermark gap among batches are tiny compared to the additional delay, the graph will just keep showing "nearly" horizontal line. While scaling unit would make us confused, adjusting min/max of y axis might be helpful. I'm just hesitate to make change as I see all existing graphs have 0 as min value of y axis, though.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730844687


   The `complicated case` in manual test demonstrates the use case of "event time processing". Please take a look at the code how I randomize the event timestamp in input rows.
   
   Technically, the graph is almost meaningless on processing time, because the event timestamp would be nearly same as batch timestamp. Even the query is lagging, once the next batch is launched, the event timestamp of inputs will be matched to the batch timestamp.
   
   The graph will be helpful if they're either using "ingest time" (not timestamped by Spark, but timestamped when ingested to the input storage) which could show the lag of process, or using "event time" which is the best case of showing the the gap.
   
   ![Figure_05_-_Event_Time_vs_Processing_Time](https://user-images.githubusercontent.com/1317309/99758506-3a852f00-2b35-11eb-9f40-5f7c5aba7ec2.png)
   
   The graph is borrowed from the gold articles below. If you haven't read below articles, strongly recommend to read them, or read the book "Streaming Systems".
   
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730852396


   > Technically, the graph is almost meaningless on processing time, because the event timestamp would be nearly same as batch timestamp. Even the query is lagging, once the next batch is launched, the event timestamp of inputs will be matched to the batch timestamp.
   > 
   > The graph will be helpful if they're either using "ingest time" (not timestamped by Spark, but timestamped when ingested to the input storage) which could show the lag of process, or using "event time" which is the best case of showing the gap.
   
   The gap is calculated by the difference between batch timestamp (this should be processing time, right? Because the trigger clock is `SystemClock` by default) and watermark. My previous question maybe not clear. If we process history data or some simulation data, the event time could be far different to processing time. For example, if we process some data from 2010 to 2019, now the gap is current time - 2010-xx-xx...?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731024742


   **[Test build #131409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131409/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733283254


   **[Test build #131704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131704/testReport)** for PR 30427 at commit [`a6db726`](https://github.com/apache/spark/commit/a6db726c10ba999077a03d90231c8224d2a4a621).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732958631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732827618


   **[Test build #131649 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131649/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732654738


   **[Test build #131622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131622/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732815872


   **[Test build #131630 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131630/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730391259


   I've just had a slight look at this and I don't think switching is needed. We're not doing this where bytes showed. If I would be a user and see a graph and all of a sudden I see a different axis meaning I would be confused.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731034220


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36010/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730799678


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35983/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730844687


   The `complicated case` in manual test demonstrates the use case of "event time processing". Please take a look at the code how I randomize the event timestamp in input rows.
   
   Technically, the graph is almost meaningless on processing time, because the event timestamp would be nearly same as batch timestamp. Even the query is lagging, once the next batch is launched, the event timestamp of inputs will be matched to the batch timestamp.
   
   The graph will be helpful if they're either using "ingest time" (not timestamped by Spark, but timestamped when ingested to the input storage) which could show the lag of process, or using "event time" which is the best case of showing the gap.
   
   ![Figure_05_-_Event_Time_vs_Processing_Time](https://user-images.githubusercontent.com/1317309/99758506-3a852f00-2b35-11eb-9f40-5f7c5aba7ec2.png)
   
   The figure is borrowed from the gold articles below. If you haven't read below articles, strongly recommend to read them, or read the book "Streaming Systems".
   
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-101/
   https://www.oreilly.com/radar/the-world-beyond-batch-streaming-102/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731008057


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730966978


   **[Test build #131405 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131405/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731007752


   **[Test build #131405 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131405/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527324138



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Changed to `Seq[Node]` as I don't see a way to provide empty Node.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733258147






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733255004


   **[Test build #131692 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131692/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732819837


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732503897


   **[Test build #131596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131596/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173573


   **[Test build #131692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131692/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528588455



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       OK I took too many efforts to write the comment and wrote too late. In short, if we can pick the second metric to compare with global watermark, it should be min instead of max. If Spark also picks min event time to construct watermark, we should pick max to see how much the output is lagging due to slow watermark advance.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730492798


   > do we support automatically scale the unit in the other graph
   
   No


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730359694


   **[Test build #131348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131348/testReport)** for PR 30427 at commit [`d82702a`](https://github.com/apache/spark/commit/d82702af591ebb1d10fa49fcc50d81373debaafd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] viirya commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
viirya commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730845680


   > The `complicated case` in manual test demonstrates the use case of "event time processing". Please take a look at the code how I randomize the event timestamp in input rows.
   > 
   
   Do I miss anything? The two code is the same.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733173573


   **[Test build #131692 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131692/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730359694


   **[Test build #131348 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131348/testReport)** for PR 30427 at commit [`d82702a`](https://github.com/apache/spark/commit/d82702af591ebb1d10fa49fcc50d81373debaafd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
sarutak commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527463165



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,53 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2

Review comment:
       If we access to the UI immediately after starting a streaming query, watermarkData can be empty.
   
   ![empty maxBy](https://user-images.githubusercontent.com/4736016/99766540-a4590500-2b44-11eb-9113-835dc7debb46.png)
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] sarutak commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
sarutak commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732647606


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731034235






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733414031






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730855993


   > If we process history data or some simulation data, the event time could be far different to processing time. For example, if we process some data from 2010 to 2019, now the gap is current time - 2010-xx-xx...?
   
   You understand it correctly, though that's just a one of use cases. Given they are running "streaming workload", one of the main goals is to capture the recent outputs (e.g. trends). Watermark would still work for such use cases as well, but what to plot to provide values even on the situation remains the question. (What would be the "ideal" timestamp to calculate the gap in this case?)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527268569



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/UISeleniumSuite.scala
##########
@@ -51,6 +53,7 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser with Matchers with B
     val conf = new SparkConf()
       .setMaster(master)
       .setAppName("ui-test")
+      .set(SHUFFLE_PARTITIONS, 5)

Review comment:
       Yes. Once I changed the active query a bit to have watermark being set, it suffered to make progress in 30 seconds (meaning checking the UI failed as there's no queryProgress) and failed. This fixed the issue.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528568246



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Yeah, that depends on how we define `watermark gap` here.
   These 2 definitions will not show a difference in result in the ideal case. My point is as the watermark is decided by event time, seems it should make more sense to use both event time to get the `gap`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528553014



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       I'm sorry, but when you pick the max even time, you're discarding the gap between wall clock and event time. The intention of showing watermark gap is showing the "gap" between "the event time" of events (which finally produces watermark) and "wall clock".
   
   Would this answer your comment?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] xuanyuanking commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
xuanyuanking commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528546502



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,58 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))

Review comment:
       Fully agree with `knowing the gap between actual wall clock and watermark looks more useful than the absolute value.` and thanks for the super useful watermark info!
   
   I only have one concern same with @viirya https://github.com/apache/spark/pull/30427#issuecomment-730852396. Maybe we can address both scenarios(event time is/isn't close to clock time) by using the max event time received in this batch? I mean:
   ```
            if (watermarkValue > 0L) {
              // seconds
   -          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
   +          val maxEventTime = parseProgressTimestamp(p.eventTime.get("max"))
   +          Some((batchTimestamp, (maxEventTime - watermarkValue) / 1000.0))
            } else {
              None
            }
   ```
   
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732950127


   **[Test build #131649 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131649/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732420408






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528467906



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Changing to `Option[Node]` make us to force wrapping `<tr>...</tr>` to Option, whereas leaving the return type to `NodeBuffer` or `Seq[Node]` don't. (`scala.xml.Node` looks to have interesting implementation - `Node` extends `NodeSeq`) 
   
   I think either `NodeBuffer` or `Seq[Node]` is simpler.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730402305


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35952/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527324138



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Changed to `Seq[Node]` as I don't see a way to instantiate empty Node.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730465034


   cc @viirya 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732678901






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731073407






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r528467906



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Changing to `Option[Node]` make us to force wrapping `<tr>...</tr>` to Option, whereas leaving the return type to `NodeBuffer` or `Seq[Node]` don't. I think either NodeBuffer or Seq[Node] is simpler.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731024742


   **[Test build #131409 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131409/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732958621






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR edited a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR edited a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730855993


   > If we process history data or some simulation data, the event time could be far different to processing time. For example, if we process some data from 2010 to 2019, now the gap is current time - 2010-xx-xx...?
   
   You understand it correctly, though that's just a one of use cases. Given they are running "streaming workload", one of the main goals is to capture the recent outputs (e.g. trends). Watermark would still work for such use cases as well, but what to plot to provide values even on the situation remains the question. (What would be the "ideal" timestamp to calculate the gap in this case?)
   
   EDIT: for that case, adjusting range on y axis would probably help, otherwise we only see the "line" plotted nearly linear like what I commented above in https://github.com/apache/spark/pull/30427#issuecomment-730701075.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733445279


   **[Test build #131704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131704/testReport)** for PR 30427 at commit [`a6db726`](https://github.com/apache/spark/commit/a6db726c10ba999077a03d90231c8224d2a4a621).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731016991


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36009/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732923488






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732654738


   **[Test build #131622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131622/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730865527






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732420408






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733258147






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527541509



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,54 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): NodeBuffer = {

Review comment:
       Option?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527493073



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatisticsPage.scala
##########
@@ -126,6 +126,53 @@ private[ui] class StreamingQueryStatisticsPage(parent: StreamingQueryTab)
     <br />
   }
 
+  def generateWatermark(
+      query: StreamingQueryUIData,
+      minBatchTime: Long,
+      maxBatchTime: Long,
+      jsCollector: JsCollector): Seq[Node] = {
+    // This is made sure on caller side but put it here to be defensive
+    require(query.lastProgress != null)
+    if (query.lastProgress.eventTime.containsKey("watermark")) {
+      val watermarkData = query.recentProgress.flatMap { p =>
+        val batchTimestamp = parseProgressTimestamp(p.timestamp)
+        val watermarkValue = parseProgressTimestamp(p.eventTime.get("watermark"))
+        if (watermarkValue > 0L) {
+          // seconds
+          Some((batchTimestamp, ((batchTimestamp - watermarkValue) / 1000.0)))
+        } else {
+          None
+        }
+      }
+      val maxWatermark = watermarkData.maxBy(_._2)._2
+      val graphUIDataForWatermark =
+        new GraphUIData(
+          "watermark-gap-timeline",
+          "watermark-gap-histogram",
+          watermarkData,
+          minBatchTime,
+          maxBatchTime,
+          0,
+          maxWatermark,
+          "seconds")
+      graphUIDataForWatermark.generateDataJs(jsCollector)
+
+      // scalastyle:off
+      <tr>
+        <td style="vertical-align: middle;">
+          <div style="width: 160px;">
+            <div><strong>Global Watermark Gap {SparkUIUtils.tooltip("The gap between batch timestamp and global watermark for the batch.", "right")}</strong></div>
+          </div>
+        </td>
+        <td class="watermark-gap-timeline">{graphUIDataForWatermark.generateTimelineHtml(jsCollector)}</td>
+        <td class="watermark-gap-timeline">{graphUIDataForWatermark.generateHistogramHtml(jsCollector)}</td>

Review comment:
       My bad. Thanks for finding!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731165797






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR closed pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR closed pull request #30427:
URL: https://github.com/apache/spark/pull/30427


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731028246


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/36009/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731008103


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/131405/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731028217


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/36009/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-731818236


   @viirya @gaborgsomogyi Could you please go through another round of review?
   
   Otherwise I'll merge this in a couple of days later if there's no further review comment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-732503897


   **[Test build #131596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131596/testReport)** for PR 30427 at commit [`d19fd10`](https://github.com/apache/spark/commit/d19fd10dab7c4fc28d1c4a893a2db74405d4ff9f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #30427: [SPARK-33224][SS][WEBUI] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-733449070






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730402325






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gaborgsomogyi commented on a change in pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
gaborgsomogyi commented on a change in pull request #30427:
URL: https://github.com/apache/spark/pull/30427#discussion_r527541032



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/UISeleniumSuite.scala
##########
@@ -51,6 +53,7 @@ class UISeleniumSuite extends SparkFunSuite with WebBrowser with Matchers with B
     val conf = new SparkConf()
       .setMaster(master)
       .setAppName("ui-test")
+      .set(SHUFFLE_PARTITIONS, 5)

Review comment:
       Has similar problem before, just wanted to double check. Thanks!




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HeartSaVioR commented on pull request #30427: [SPARK-33224][SS] Add watermark gap information into SS UI page

Posted by GitBox <gi...@apache.org>.
HeartSaVioR commented on pull request #30427:
URL: https://github.com/apache/spark/pull/30427#issuecomment-730846093


   Sorry, copy & paste error. Just updated.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org