You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/06/10 09:12:18 UTC

[GitHub] [spark] uncleGen opened a new pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

uncleGen opened a new pull request #28781:
URL: https://github.com/apache/spark/pull/28781


   ### What changes were proposed in this pull request?
   
   Add Spark Structured Streaming History Server Support.
   
   ### Why are the changes needed?
   
   Add a streaming query history server plugin.
   
   ![image](https://user-images.githubusercontent.com/7402327/84248291-d26cfe80-ab3b-11ea-86d2-98205fa2bcc4.png)
   ![image](https://user-images.githubusercontent.com/7402327/84248347-e44ea180-ab3b-11ea-81de-eefe207656f2.png)
   ![image](https://user-images.githubusercontent.com/7402327/84248396-f0d2fa00-ab3b-11ea-9b0d-e410115471b0.png)
   
   - Follow-ups
     - Query duration should not update in history UI.
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### How was this patch tested?
   Update UT.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666049254






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642158735






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] asfgit closed pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

asfgit closed pull request #28781:
URL: https://github.com/apache/spark/pull/28781


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666973362


   **[Test build #126871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126871/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686367631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666898656


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735817122






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r440937299



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: ElementTrackingStore) {
+
+  def queriesCount(): Long = store.count(classOf[StreamingQuerySummary])

Review comment:
       No usage for `queriesCount`? Do you plan to use it in `StreamingQueryHistoryServerPlugin`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: ElementTrackingStore) {

Review comment:
       Forget to address this comment https://github.com/apache/spark/pull/28781#discussion_r438704385? I think using` store: KVStore` here is enough.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -95,7 +95,11 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
         // synchronously and the ones attached to LiveListenerBus asynchronously. Therefore,
         // we need to ignore QueryStartedEvent if this method is called within SparkListenerBus
         // thread
-        if (!LiveListenerBus.withinListenerThread.value || !e.isInstanceOf[QueryStartedEvent]) {
+        //
+        // When loaded by Spark History Server, we should process all event coming from replay
+        // listener bus.
+        if (!live || !LiveListenerBus.withinListenerThread.value ||
+          !e.isInstanceOf[QueryStartedEvent])  {

Review comment:
       nit: add 2 more space here.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val trackingStore = new ElementTrackingStore(ui.store.store, ui.conf)

Review comment:
       Ditto, don't need to new ElementTrackingStore here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644219054






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686286109


   **[Test build #128225 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128225/testReport)** for PR 28781 at commit [`9eb353c`](https://github.com/apache/spark/commit/9eb353c649573181741cf52b91401b3b57b50a77).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644040894


   **[Test build #124051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124051/testReport)** for PR 28781 at commit [`f280157`](https://github.com/apache/spark/commit/f2801571c222f42b56c661a0693039482d8ad2fd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r509029910



##########
File path: sql/core/src/test/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin
##########
@@ -0,0 +1,2 @@
+org.apache.spark.sql.execution.ui.SQLHistoryServerPlugin

Review comment:
       +1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438496827



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)

Review comment:
       sorry, typo




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736978616


   **[Test build #132018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686287143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686453954






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648635479


   **[Test build #124469 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124469/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class StreamingQueryStatusStore(store: KVStore) `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-667083841


   **[Test build #126871 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126871/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734786670






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642455264






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642361040






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643863164






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438497690



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val store = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(store)

Review comment:
       `ElementTracingStore`  is a `KVStore` wrapper that allows tracking the number of elements of specific types, and triggering actions once they reach a threshold. Here, use a `ElementTracingStore` to wrap `ui.store.store`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666898656






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648622520


   @gengliangwang make sense, let me prepare a streaming query workload to make test case.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666973707






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643862941


   **[Test build #124023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124023/testReport)** for PR 28781 at commit [`c721d8c`](https://github.com/apache/spark/commit/c721d8c7299d503f1010cc8626ffb77d492c1bda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438704728



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val store = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(store)

Review comment:
       Is this also related with https://github.com/apache/spark/pull/28781#discussion_r438683751?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642158735


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713449993


   **[Test build #130087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130087/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737131326






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735761635






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666898975






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642358394


   **[Test build #123805 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123805/testReport)** for PR 28781 at commit [`71cfbcd`](https://github.com/apache/spark/commit/71cfbcd2040295bcfd80d0bf2170347fbffaf206).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737033196


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713489009


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34696/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713405114


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34688/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735522605






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666708927


   **[Test build #126820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126820/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713351314


   **[Test build #130075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130075/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713388222


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34684/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735522605






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648635586






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643202806


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666048818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-641870375






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r439170994



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648643664


   **[Test build #124471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124471/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666649293






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643133365


   **[Test build #123907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123907/testReport)** for PR 28781 at commit [`38b9506`](https://github.com/apache/spark/commit/38b9506ff7ca47c48be7f6af83b83812bbcfd97c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642604747






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713388067


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34688/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713357221






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666973362


   **[Test build #126871 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126871/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643862941


   **[Test build #124023 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124023/testReport)** for PR 28781 at commit [`c721d8c`](https://github.com/apache/spark/commit/c721d8c7299d503f1010cc8626ffb77d492c1bda).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735513600


   **[Test build #131947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131947/testReport)** for PR 28781 at commit [`0e7db4b`](https://github.com/apache/spark/commit/0e7db4b2af33cf11f48f5930d28c42e9bf439ce0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713432250


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642363026






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648624435






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642361040






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686232996


   **[Test build #128225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128225/testReport)** for PR 28781 at commit [`9eb353c`](https://github.com/apache/spark/commit/9eb353c649573181741cf52b91401b3b57b50a77).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666968279






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735553220






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648635586


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686502711


   Thanks for working on this! The changes look good. Add more reviewers to have a look. 
   cc @rednaxelafx @gengliangwang @zsxwing @HeartSaVioR @cloud-fan 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438018288



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)

Review comment:
       `clone.clone()`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val store = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(store)

Review comment:
       Why do we need to create a new ElementTracingStore here instead of passing `ui.store.store` directly?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -35,12 +35,12 @@ import org.apache.spark.util.ListenerBus
  * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only
  * those queries that were started in the associated SparkSession.
  */
-class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
+class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus])

Review comment:
       Nit: how about keep `live: Boolean` here to divide the live and history scenario explicitly?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686233387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642463010


   **[Test build #123833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123833/testReport)** for PR 28781 at commit [`c1c600f`](https://github.com/apache/spark/commit/c1c600fc4832154d8fce9b2f881096106e206e16).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666876537






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713489023






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735513600


   **[Test build #131947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131947/testReport)** for PR 28781 at commit [`0e7db4b`](https://github.com/apache/spark/commit/0e7db4b2af33cf11f48f5930d28c42e9bf439ce0).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736998496






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-667085440


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643930255






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643863164






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686287143






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-667085440






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686452670


   **[Test build #128247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128247/testReport)** for PR 28781 at commit [`fa5c6b1`](https://github.com/apache/spark/commit/fa5c6b124be84699ba32935d1a98c82fa04394fc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686233387






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642461839


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r437984275



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -35,12 +35,12 @@ import org.apache.spark.util.ListenerBus
  * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only
  * those queries that were started in the associated SparkSession.
  */
-class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
+class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus])

Review comment:
       Make `sparkListenerBus `optional. When loaded by History Server, spark use the `ReplayListenerBus` instead of `LiveListenerBus`. In live ui, streaming query post the event into `StreamingQueryListenerBus`, and the `StreamingQueryListenerBus` will post these events into `LiveListenerBus`. `StreamingQueryListenerBus` also subscribes to `LiveListenerBus` events thus getting back the posted event in a different thread. In history ui, `StreamingQueryListenerBus` will subscribes to the `ReplayListenerBus`, and process those events through `onOtherEvent()` directly.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r509034217



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {

Review comment:
       +1

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIds,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
+    queryToProgress.remove(event.runId)
   }
+}
 
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
-  }
+private[sql] class StreamingQuerySummary(
+    val name: String,
+    val id: UUID,
+    @KVIndexParam val runId: UUID,
+    val progressIds: Array[String],
+    val startTimestamp: Long,
+    val isActive: Boolean,
+    val exception: Option[String]) {
+  @JsonIgnore @KVIndex("active")
+  private def activeIndex: Boolean = isActive
+  @JsonIgnore @KVIndex("startTimestamp")
+  private def startTimestampIndex: Long = startTimestamp
 }
 
 /**
  * This class contains all message related to UI display, each instance corresponds to a single
  * [[org.apache.spark.sql.streaming.StreamingQuery]].
  */
-private[ui] class StreamingQueryUIData(
-    val name: String,
-    val id: UUID,
-    val runId: UUID,
-    val startTimestamp: Long) {
-
-  /** Holds the most recent query progress updates. */
-  private val progressBuffer = new mutable.Queue[StreamingQueryProgress]()
-
-  private var _isActive = true
-  private var _exception: Option[String] = None
-
-  def isActive: Boolean = synchronized { _isActive }
-
-  def exception: Option[String] = synchronized { _exception }
-
-  def queryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    _isActive = false
-    _exception = event.exception
-  }
-
-  def updateProcess(
-      newProgress: StreamingQueryProgress, retentionNum: Int): Unit = progressBuffer.synchronized {
-    progressBuffer += newProgress
-    while (progressBuffer.length >= retentionNum) {
-      progressBuffer.dequeue()
-    }
-  }
-
-  def recentProgress: Array[StreamingQueryProgress] = progressBuffer.synchronized {
-    progressBuffer.toArray
-  }
+private[sql] case class StreamingQueryUIData(
+    summary: StreamingQuerySummary,
+    recentProgress: Array[StreamingQueryProgress],
+    lastProgress: StreamingQueryProgress)

Review comment:
       +1

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIds,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
+    queryToProgress.remove(event.runId)
   }
+}
 
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
-  }
+private[sql] class StreamingQuerySummary(

Review comment:
       ok




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648624428






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666968279


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642603617


   **[Test build #123833 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123833/testReport)** for PR 28781 at commit [`c1c600f`](https://github.com/apache/spark/commit/c1c600fc4832154d8fce9b2f881096106e206e16).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus], live: Boolean = true)`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713432261


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/130079/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735523446


   **[Test build #131954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131954/testReport)** for PR 28781 at commit [`8a7de59`](https://github.com/apache/spark/commit/8a7de597c978c47f0e33d1f646c50ad4225405cd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737042534


   **[Test build #132029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132029/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713351314


   **[Test build #130075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130075/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686366899


   **[Test build #128246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128246/testReport)** for PR 28781 at commit [`9709405`](https://github.com/apache/spark/commit/970940581369f1868d6f5ba13464bb80cb383ebb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666708927


   **[Test build #126820 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126820/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648640753


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642454624






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713448735


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735554451


   **[Test build #131954 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131954/testReport)** for PR 28781 at commit [`8a7de59`](https://github.com/apache/spark/commit/8a7de597c978c47f0e33d1f646c50ad4225405cd).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666875388


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686371014


   **[Test build #128247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128247/testReport)** for PR 28781 at commit [`fa5c6b1`](https://github.com/apache/spark/commit/fa5c6b124be84699ba32935d1a98c82fa04394fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713526558


   **[Test build #130087 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130087/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735538056






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713357221






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666967780


   **[Test build #126847 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126847/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666845431


   **[Test build #126820 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126820/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666875388


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444673455



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: ElementTrackingStore) {
+
+  def queriesCount(): Long = store.count(classOf[StreamingQuerySummary])

Review comment:
       removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686453954






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713374797


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34684/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713388249






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737072512






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r439170902



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryUIData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryUIData]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass(), e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQueryUIData(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQueryUIData], runId)
+    val progressIdQueue =
+      querySummary.progressIdQueue ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIdQueue.length >= streamingProgressRetention) {
+      val uniqueId = progressIdQueue.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQueryUIData], runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      progressIdQueue,
+      querySummary.startTimestamp,
+      querySummary.isActive,
+      querySummary.exception
+    ))
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
-  }
-
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
+    val querySummary = store.read(classOf[StreamingQueryUIData], event.runId)
+    store.delete(classOf[StreamingQueryUIData], event.runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIdQueue,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
   }
 }
 
 /**
  * This class contains all message related to UI display, each instance corresponds to a single
  * [[org.apache.spark.sql.streaming.StreamingQuery]].
  */
-private[ui] class StreamingQueryUIData(
+private[sql] class StreamingQueryUIData(
     val name: String,
     val id: UUID,
-    val runId: UUID,
-    val startTimestamp: Long) {
+    @KVIndexParam val runId: UUID,
+    val progressIdQueue: Queue[String],
+    val startTimestamp: Long,
+    val isActive: Boolean,
+    val exception: Option[String]) {
 
-  /** Holds the most recent query progress updates. */
-  private val progressBuffer = new mutable.Queue[StreamingQueryProgress]()
+  private var storeOption: Option[ElementTrackingStore] = None

Review comment:
       OK, let me materialize UI data directly instead of putting `KVStore` into `StreamingQueryUIData`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734784088


   **[Test build #131883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131883/testReport)** for PR 28781 at commit [`3bf1c40`](https://github.com/apache/spark/commit/3bf1c40672f7d35a71cb72981d0b52f21dc6dbcd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642463010


   **[Test build #123833 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123833/testReport)** for PR 28781 at commit [`c1c600f`](https://github.com/apache/spark/commit/c1c600fc4832154d8fce9b2f881096106e206e16).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643202814


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123907/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643202653


   **[Test build #123907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123907/testReport)** for PR 28781 at commit [`38b9506`](https://github.com/apache/spark/commit/38b9506ff7ca47c48be7f6af83b83812bbcfd97c).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642362709






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713527738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713527738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713357699


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r437985761



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -110,7 +116,15 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
       listener: StreamingQueryListener,
       event: StreamingQueryListener.Event): Unit = {
     def shouldReport(runId: UUID): Boolean = {
-      activeQueryRunIds.synchronized { activeQueryRunIds.contains(runId) }
+      // When loaded by Spark History Server, we should process all event coming from replay
+      // listener bus.
+      if (sparkListenerBus.isEmpty) {

Review comment:
       In live ui, when one streaming query event posted. `StreamingQueryListenerBus` will manage the query status. But in history ui, there is no need to manage these status, and we should process all replayed events.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666846461


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666846473


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126820/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648780252


   **[Test build #124471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124471/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `class StreamingQueryStatusStore(store: KVStore) `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648607182


   @uncleGen Thanks for the work!
   I think the major issue of this PR is that there are no related test cases. How about upload event log of streaming workload and create test cases like HistoryServerSuite
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

zsxwing commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r495634040



##########
File path: sql/core/src/test/resources/META-INF/services/org.apache.spark.status.AppHistoryServerPlugin
##########
@@ -0,0 +1,2 @@
+org.apache.spark.sql.execution.ui.SQLHistoryServerPlugin

Review comment:
       Why do we need this file? It's already in main/resources so we should be able to access it in tests.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(ui.store.store)
+    new StreamingQueryTab(streamingQueryStatusStore, ui)

Review comment:
       nit: we can add a check to avoid displaying the tab when there is no streaming query:
   ```
   if (streamingQueryStatusStore.allQueryUIData.nonEmpty) {
     new StreamingQueryTab(streamingQueryStatusStore, ui)
   }
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {

Review comment:
       if we remove these two methods, we won't need the `active` index. Can we remove them and update tests to use `allQueryUIData` instead?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,62 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {

Review comment:
       `synchronized` is not needed.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)

Review comment:
       What's the time complexity of this operation and the one below? Any chance it will be O(number of all query progresses)?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIds,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
+    queryToProgress.remove(event.runId)
   }
+}
 
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
-  }
+private[sql] class StreamingQuerySummary(

Review comment:
       nit: maybe call it `StreamingQueryData` to make it consistent all names in core and sql.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -110,7 +120,7 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
       listener: StreamingQueryListener,
       event: StreamingQueryListener.Event): Unit = {
     def shouldReport(runId: UUID): Boolean = {
-      activeQueryRunIds.synchronized { activeQueryRunIds.contains(runId) }
+      !live || activeQueryRunIds.synchronized { activeQueryRunIds.contains(runId) }

Review comment:
       It would be great if we can add a comment to explain why we should always report in SHS

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {

Review comment:
       We need `synchronized`. Why not use `ConcurrentHashMap`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIds,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
+    queryToProgress.remove(event.runId)
   }
+}
 
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
-  }
+private[sql] class StreamingQuerySummary(
+    val name: String,
+    val id: UUID,
+    @KVIndexParam val runId: UUID,
+    val progressIds: Array[String],
+    val startTimestamp: Long,
+    val isActive: Boolean,
+    val exception: Option[String]) {
+  @JsonIgnore @KVIndex("active")
+  private def activeIndex: Boolean = isActive
+  @JsonIgnore @KVIndex("startTimestamp")
+  private def startTimestampIndex: Long = startTimestamp
 }
 
 /**
  * This class contains all message related to UI display, each instance corresponds to a single
  * [[org.apache.spark.sql.streaming.StreamingQuery]].
  */
-private[ui] class StreamingQueryUIData(
-    val name: String,
-    val id: UUID,
-    val runId: UUID,
-    val startTimestamp: Long) {
-
-  /** Holds the most recent query progress updates. */
-  private val progressBuffer = new mutable.Queue[StreamingQueryProgress]()
-
-  private var _isActive = true
-  private var _exception: Option[String] = None
-
-  def isActive: Boolean = synchronized { _isActive }
-
-  def exception: Option[String] = synchronized { _exception }
-
-  def queryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    _isActive = false
-    _exception = event.exception
-  }
-
-  def updateProcess(
-      newProgress: StreamingQueryProgress, retentionNum: Int): Unit = progressBuffer.synchronized {
-    progressBuffer += newProgress
-    while (progressBuffer.length >= retentionNum) {
-      progressBuffer.dequeue()
-    }
-  }
-
-  def recentProgress: Array[StreamingQueryProgress] = progressBuffer.synchronized {
-    progressBuffer.toArray
-  }
+private[sql] case class StreamingQueryUIData(
+    summary: StreamingQuerySummary,
+    recentProgress: Array[StreamingQueryProgress],
+    lastProgress: StreamingQueryProgress)

Review comment:
       We can get `lastProgress` from  `recentProgress`. Not need to pass it into the constructor.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686371014


   **[Test build #128247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128247/testReport)** for PR 28781 at commit [`fa5c6b1`](https://github.com/apache/spark/commit/fa5c6b124be84699ba32935d1a98c82fa04394fc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735554962






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686371738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642604747






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r509030242



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(ui.store.store)
+    new StreamingQueryTab(streamingQueryStatusStore, ui)

Review comment:
       +1




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r437985761



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -110,7 +116,15 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
       listener: StreamingQueryListener,
       event: StreamingQueryListener.Event): Unit = {
     def shouldReport(runId: UUID): Boolean = {
-      activeQueryRunIds.synchronized { activeQueryRunIds.contains(runId) }
+      // When loaded by Spark History Server, we should process all event coming from replay
+      // listener bus.
+      if (sparkListenerBus.isEmpty) {

Review comment:
       In live ui, when one streaming query event posted. `StreamingQueryListenerBus` will manage the query status. But in history ui, `StreamingQueryListenerBus` subscribes to the `ReplayListenerBus`, it should process all events.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-641869987


   **[Test build #123750 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123750/testReport)** for PR 28781 at commit [`e2180ad`](https://github.com/apache/spark/commit/e2180add4fbe324f9c6edd54369ea052ec9e9587).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444673665



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: ElementTrackingStore) {

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734788312






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644040894


   **[Test build #124051 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124051/testReport)** for PR 28781 at commit [`f280157`](https://github.com/apache/spark/commit/f2801571c222f42b56c661a0693039482d8ad2fd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735730256


   **[Test build #131981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131981/testReport)** for PR 28781 at commit [`9531ac9`](https://github.com/apache/spark/commit/9531ac99292b7f7bb829eabcbacae41e2b2f3cce).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735553220






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686232996


   **[Test build #128225 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128225/testReport)** for PR 28781 at commit [`9eb353c`](https://github.com/apache/spark/commit/9eb353c649573181741cf52b91401b3b57b50a77).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713405136






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648781386






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648644289






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zsxwing commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

zsxwing commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737593766


   Merging to master.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642362709


   **[Test build #123807 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123807/testReport)** for PR 28781 at commit [`c1c600f`](https://github.com/apache/spark/commit/c1c600fc4832154d8fce9b2f881096106e206e16).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r462696730



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(_.summary.isActive)

Review comment:
       fixed

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(_.summary.isActive)
+  }
+
+  // Visible for testing.
+  private[sql] def inactiveQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(!_.summary.isActive)

Review comment:
       fixed

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)

Review comment:
       fixed

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)
+    val progressIds =
+      querySummary.progressIds ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQuerySummary], runId)

Review comment:
       fixed

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)
+    val progressIds =
+      querySummary.progressIds ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQuerySummary], runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      progressIds,
+      querySummary.startTimestamp,
+      querySummary.isActive,
+      querySummary.exception
+    ))
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.delete(classOf[StreamingQuerySummary], event.runId)

Review comment:
       fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438683751



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryUIData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryUIData]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass(), e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQueryUIData(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQueryUIData], runId)
+    val progressIdQueue =
+      querySummary.progressIdQueue ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIdQueue.length >= streamingProgressRetention) {
+      val uniqueId = progressIdQueue.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQueryUIData], runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      progressIdQueue,
+      querySummary.startTimestamp,
+      querySummary.isActive,
+      querySummary.exception
+    ))
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
-  }
-
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
+    val querySummary = store.read(classOf[StreamingQueryUIData], event.runId)
+    store.delete(classOf[StreamingQueryUIData], event.runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIdQueue,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
   }
 }
 
 /**
  * This class contains all message related to UI display, each instance corresponds to a single
  * [[org.apache.spark.sql.streaming.StreamingQuery]].
  */
-private[ui] class StreamingQueryUIData(
+private[sql] class StreamingQueryUIData(
     val name: String,
     val id: UUID,
-    val runId: UUID,
-    val startTimestamp: Long) {
+    @KVIndexParam val runId: UUID,
+    val progressIdQueue: Queue[String],
+    val startTimestamp: Long,
+    val isActive: Boolean,
+    val exception: Option[String]) {
 
-  /** Holds the most recent query progress updates. */
-  private val progressBuffer = new mutable.Queue[StreamingQueryProgress]()
+  private var storeOption: Option[ElementTrackingStore] = None

Review comment:
       Can we get rid of saving KVStore in UIData? I think all the store related operation should be placed in `StreamingQueryStatesStore`?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val trackingStore = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(trackingStore)

Review comment:
       Same question here, can we use `ui.store.store` here directly?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -95,7 +95,13 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
         // synchronously and the ones attached to LiveListenerBus asynchronously. Therefore,
         // we need to ignore QueryStartedEvent if this method is called within SparkListenerBus
         // thread
-        if (!LiveListenerBus.withinListenerThread.value || !e.isInstanceOf[QueryStartedEvent]) {
+        //
+        // When loaded by Spark History Server, we should process all event coming from replay
+        // listener bus.
+        if (!live) {
+          postToAll(e)

Review comment:
       Same code in if and else branch, let's combine them.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.

Review comment:
       This todo can be deleted.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone().set(ASYNC_TRACKING_ENABLED, false)

Review comment:
       Do we need the replayConf here? I think the `ASYNC_TRACKING_ENABLED` has already taken effect in `FsHistoryProvider.rebuildAppStore`.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643930255


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643929926


   **[Test build #124023 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124023/testReport)** for PR 28781 at commit [`c721d8c`](https://github.com/apache/spark/commit/c721d8c7299d503f1010cc8626ffb77d492c1bda).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `trait TimestampFormatterHelper extends TimeZoneAwareExpression `
     * `case class WidthBucket(`
     * `trait PredicateHelper extends Logging `
     * `case class ProcessingTimeTrigger(intervalMs: Long) extends Trigger `
     * `case class ContinuousTrigger(intervalMs: Long) extends Trigger `


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736978616


   **[Test build #132018 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642455944






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666898975






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686367937


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/128246/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713405136






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-684146375


   @uncleGen Sorry for the late reply, could you please resolve the conflicts? Let's continue this work.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-667085443


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126871/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735730256


   **[Test build #131981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131981/testReport)** for PR 28781 at commit [`9531ac9`](https://github.com/apache/spark/commit/9531ac99292b7f7bb829eabcbacae41e2b2f3cce).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686371738






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735514876


   **[Test build #131947 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131947/testReport)** for PR 28781 at commit [`0e7db4b`](https://github.com/apache/spark/commit/0e7db4b2af33cf11f48f5930d28c42e9bf439ce0).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734786651


   **[Test build #131883 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131883/testReport)** for PR 28781 at commit [`3bf1c40`](https://github.com/apache/spark/commit/3bf1c40672f7d35a71cb72981d0b52f21dc6dbcd).
    * This patch **fails to build**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r439171707



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None, live = false)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone().set(ASYNC_TRACKING_ENABLED, false)

Review comment:
       make sense, remove it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642455257






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643130422






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642455257






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r533229138



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,144 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  // Events from the same query run will never be processed concurrently, so it's safe to
+  // access `progressIds` without any protection.
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val inactiveQueries = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+    val numInactiveQueries = inactiveQueries.size
+    if (numInactiveQueries <= inactiveQueryStatusRetention) {
+      return
+    }
+    val toDelete = inactiveQueries.sortBy(_.endTimestamp.get)
+      .take(numInactiveQueries - inactiveQueryStatusRetention)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      isActive = true,
+      None,
+      startTimestamp
+    ), checkTriggers = true)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
-  }
-
-  override def onQueryTerminated(
-      event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.containsKey(runId)) {
+      queryToProgress.put(runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length > streamingProgressRetention) {

Review comment:
       logic update: use `>` instead of `>=` 

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import java.util.UUID
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryData, StreamingQueryProgressWrapper, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = {
+    val view = store.view(classOf[StreamingQueryData]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // visible for test
+  private[sql] def getQueryProgressData(runId: UUID): Seq[StreamingQueryProgressWrapper] = {
+    val view = store.view(classOf[StreamingQueryProgressWrapper])
+      .index("runId").first(runId.toString).last(runId.toString)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+  }
+
+  private def makeUIData(summary: StreamingQueryData): StreamingQueryUIData = {
+    val runId = summary.runId.toString
+    val view = store.view(classOf[StreamingQueryProgressWrapper])
+      .index("runId").first(runId).last(runId)
+    val recentProgress = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+      .map(_.progress).sortBy(_.timestamp).toArray
+    StreamingQueryUIData(summary, recentProgress)

Review comment:
       bugfix update: get `StreamingQueryProgressWrapper`  from KVstore

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala
##########
@@ -94,11 +107,112 @@ class StreamingQueryStatusListenerSuite extends StreamTest {
     listener.onQueryStarted(startEvent1)
 
     // result checking
-    assert(listener.activeQueryStatus.size() == 1)
-    assert(listener.inactiveQueryStatus.length == 1)
-    assert(listener.activeQueryStatus.containsKey(runId1))
-    assert(listener.activeQueryStatus.get(runId1).id == id)
-    assert(listener.inactiveQueryStatus.head.runId == runId0)
-    assert(listener.inactiveQueryStatus.head.id == id)
+    assert(queryStore.allQueryUIData.count(_.summary.isActive) == 1)
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).length == 1)
+    assert(queryStore.allQueryUIData.filter(_.summary.isActive).exists(_.summary.runId == runId1))
+    assert(queryStore.allQueryUIData.filter(_.summary.isActive).exists(uiData =>
+      uiData.summary.runId == runId1 && uiData.summary.id == id))
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.runId == runId0)
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.id == id)
+  }
+
+  test("test small retained queries") {

Review comment:
       add new ut

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,144 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  // Events from the same query run will never be processed concurrently, so it's safe to
+  // access `progressIds` without any protection.
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val inactiveQueries = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+    val numInactiveQueries = inactiveQueries.size
+    if (numInactiveQueries <= inactiveQueryStatusRetention) {
+      return
+    }

Review comment:
       logic update: use `STREAMING_UI_RETAINED_QUERIES ` to clean inactive query.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,144 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  // Events from the same query run will never be processed concurrently, so it's safe to
+  // access `progressIds` without any protection.
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val inactiveQueries = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+    val numInactiveQueries = inactiveQueries.size
+    if (numInactiveQueries <= inactiveQueryStatusRetention) {
+      return
+    }
+    val toDelete = inactiveQueries.sortBy(_.endTimestamp.get)
+      .take(numInactiveQueries - inactiveQueryStatusRetention)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      isActive = true,
+      None,
+      startTimestamp
+    ), checkTriggers = true)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
-  }
-
-  override def onQueryTerminated(
-      event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.containsKey(runId)) {
+      queryToProgress.put(runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length > streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
     }
   }
 
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
+  override def onQueryTerminated(
+      event: StreamingQueryListener.QueryTerminatedEvent): Unit = {
+    val querySummary = store.read(classOf[StreamingQueryData], event.runId)
+    val curTime = System.currentTimeMillis()
+    store.write(new StreamingQueryData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      isActive = false,
+      querySummary.exception,
+      querySummary.startTimestamp,
+      Some(curTime)
+    ), checkTriggers = true)
+    queryToProgress.remove(event.runId)
   }
 }
 
+private[sql] class StreamingQueryData(
+    val name: String,
+    val id: UUID,
+    @KVIndexParam val runId: UUID,
+    @KVIndexParam("active") val isActive: Boolean,
+    val exception: Option[String],
+    @KVIndexParam("startTimestamp") val startTimestamp: Long,
+    val endTimestamp: Option[Long] = None)

Review comment:
       adding `endTimestamp` to help to clean inactive queries.

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala
##########
@@ -94,11 +107,112 @@ class StreamingQueryStatusListenerSuite extends StreamTest {
     listener.onQueryStarted(startEvent1)
 
     // result checking
-    assert(listener.activeQueryStatus.size() == 1)
-    assert(listener.inactiveQueryStatus.length == 1)
-    assert(listener.activeQueryStatus.containsKey(runId1))
-    assert(listener.activeQueryStatus.get(runId1).id == id)
-    assert(listener.inactiveQueryStatus.head.runId == runId0)
-    assert(listener.inactiveQueryStatus.head.id == id)
+    assert(queryStore.allQueryUIData.count(_.summary.isActive) == 1)
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).length == 1)
+    assert(queryStore.allQueryUIData.filter(_.summary.isActive).exists(_.summary.runId == runId1))
+    assert(queryStore.allQueryUIData.filter(_.summary.isActive).exists(uiData =>
+      uiData.summary.runId == runId1 && uiData.summary.id == id))
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.runId == runId0)
+    assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.id == id)
+  }
+
+  test("test small retained queries") {
+    val kvStore = new ElementTrackingStore(new InMemoryStore(), sparkConf)
+    val conf = spark.sparkContext.conf
+    conf.set(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES.key, "2")
+    val listener = new StreamingQueryStatusListener(conf, kvStore)
+    val queryStore = new StreamingQueryStatusStore(kvStore)
+
+    def addNewQuery(): (UUID, UUID) = {
+      val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
+      format.setTimeZone(getTimeZone("UTC"))
+      val id = UUID.randomUUID()
+      val runId = UUID.randomUUID()
+      val startEvent = new StreamingQueryListener.QueryStartedEvent(
+        id, runId, "test1", format.format(new Date(System.currentTimeMillis())))
+      listener.onQueryStarted(startEvent)
+      (id, runId)
+    }
+
+    val (id1, runId1) = addNewQuery()
+    val (id2, runId2) = addNewQuery()
+    val (id3, runId3) = addNewQuery()
+
+    assert(queryStore.allQueryUIData.count(!_.summary.isActive) == 0)
+
+    val terminateEvent1 = new StreamingQueryListener.QueryTerminatedEvent(id1, runId1, None)
+    listener.onQueryTerminated(terminateEvent1)
+    // sleep 100 mills to make sure clean work complete
+    Thread.sleep(100)
+    assert(queryStore.allQueryUIData.count(!_.summary.isActive) == 1)
+    var inactiveQueries = queryStore.allQueryUIData.filter(!_.summary.isActive).map(_.summary.id)
+    assert(inactiveQueries == Seq(id1))
+
+    val terminateEvent2 = new StreamingQueryListener.QueryTerminatedEvent(id2, runId2, None)
+    listener.onQueryTerminated(terminateEvent2)
+    // sleep 100 mills to make sure clean work complete
+    Thread.sleep(100)
+    assert(queryStore.allQueryUIData.count(!_.summary.isActive) == 2)
+    inactiveQueries = queryStore.allQueryUIData.filter(!_.summary.isActive).map(_.summary.id)
+    assert(inactiveQueries == Seq(id1, id2))
+
+    val terminateEvent3 = new StreamingQueryListener.QueryTerminatedEvent(id3, runId3, None)
+    listener.onQueryTerminated(terminateEvent3)
+    // sleep 100 mills to make sure clean work complete
+    Thread.sleep(100)
+    assert(queryStore.allQueryUIData.count(!_.summary.isActive) == 2)
+    inactiveQueries = queryStore.allQueryUIData.filter(!_.summary.isActive).map(_.summary.id)
+    assert(inactiveQueries == Seq(id2, id3))
+  }
+
+  test("test small retained progress") {

Review comment:
       add new ut

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,144 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  // Events from the same query run will never be processed concurrently, so it's safe to
+  // access `progressIds` without any protection.
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val inactiveQueries = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+    val numInactiveQueries = inactiveQueries.size
+    if (numInactiveQueries <= inactiveQueryStatusRetention) {
+      return
+    }
+    val toDelete = inactiveQueries.sortBy(_.endTimestamp.get)
+      .take(numInactiveQueries - inactiveQueryStatusRetention)

Review comment:
       add new logic: clean earliest inactive query first.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735538056






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644041498






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713432250






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713489031


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34696/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643202806






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-641876331


   cc @xuanyuanking 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-641870375






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737131326






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648644282






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737042534


   **[Test build #132029 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132029/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642358736






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444689955



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)

Review comment:
       we should have index over `isActive` here

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(_.summary.isActive)

Review comment:
       shall we improve the data structure and build index over `isActive`? So that we can get all the active queries direcotry from kvstore

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(_.summary.isActive)

Review comment:
       Also, the `makeUIData` should be called after the `isActive` field is filtered.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)

Review comment:
       Shall we keep `LiveEntity` in the listener so that we don't need to lookup the  store here? You can check the implementation of `SQLAppStatusListener`. 

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)
+    val progressIds =
+      querySummary.progressIds ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQuerySummary], runId)
+    store.write(new StreamingQuerySummary(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      progressIds,
+      querySummary.startTimestamp,
+      querySummary.isActive,
+      querySummary.exception
+    ))
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQuerySummary], event.runId)
+    store.delete(classOf[StreamingQuerySummary], event.runId)

Review comment:
       ditto, no need to call delete

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,60 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryProgressWrapper, StreamingQuerySummary, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = synchronized {
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // Visible for testing.
+  private[sql] def activeQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(_.summary.isActive)
+  }
+
+  // Visible for testing.
+  private[sql] def inactiveQueryUIData(): Seq[StreamingQueryUIData] = {
+    allQueryUIData.filter(!_.summary.isActive)

Review comment:
       ditto

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)
+    val progressIds =
+      querySummary.progressIds ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQuerySummary], runId)

Review comment:
       I don't think we need to call delete method here. Writing an object with the same primary key will overwrite the existing value.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648635599


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124469/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713449993


   **[Test build #130087 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130087/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666049254






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737072512






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666972499


   retest this please


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666934489


   **[Test build #126847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126847/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642358736






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648623980


   **[Test build #124469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124469/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666876537






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666934489


   **[Test build #126847 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/126847/testReport)** for PR 28781 at commit [`aa7a682`](https://github.com/apache/spark/commit/aa7a6828628da713dc17b6f15760eb21fef21378).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r509034111



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,139 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new mutable.HashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)

Review comment:
       The inactive queries should be limited in product, and there should not be inactive query to delete in every trigger interval. Then we use the `removeAllByIndexValues` to delete all query progresses in batch way.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643930261


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/124023/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686366899


   **[Test build #128246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128246/testReport)** for PR 28781 at commit [`9709405`](https://github.com/apache/spark/commit/970940581369f1868d6f5ba13464bb80cb383ebb).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686367931






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648623980


   **[Test build #124469 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124469/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735523446


   **[Test build #131954 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131954/testReport)** for PR 28781 at commit [`8a7de59`](https://github.com/apache/spark/commit/8a7de597c978c47f0e33d1f646c50ad4225405cd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713473724


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/34696/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643130422






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642158749


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123750/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737041583






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444572776



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -35,12 +35,12 @@ import org.apache.spark.util.ListenerBus
  * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only
  * those queries that were started in the associated SparkSession.
  */
-class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
+class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus])

Review comment:
       @uncleGen Could you update the comment for `StreamingQueryListenerBus`?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713388249


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zsxwing commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

zsxwing commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r527140209



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,145 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(event.progress.runId)

Review comment:
       nit: can we add a comment to explain why we don't need `synchronized` here, such as
   ```
   Events from the same query run will never be processed concurrently, so it's safe to access `progressIds` without any protection.
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -31,16 +32,21 @@ import org.apache.spark.util.ListenerBus
  * Spark listener bus, so that it can receive [[StreamingQueryListener.Event]]s and dispatch them
  * to StreamingQueryListeners.
  *
- * Note that each bus and its registered listeners are associated with a single SparkSession
+ * Note 1: Each bus and its registered listeners are associated with a single SparkSession
  * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only
  * those queries that were started in the associated SparkSession.
+ *
+ * Note 2: To rebuild Structured Streaming UI in SHS, this bus will be registered into
+ * [[ReplayListenerBus]]. We use the `live` argument (true in default) to determine how to process
+ * [[StreamingQueryListener.Event]]. If `live` is false, it means this bus is used to replay all
+ * streaming query event from eventLog.
  */
-class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
+class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus], live: Boolean = true)

Review comment:
       nit: We don't need the `live` parameter. We can get `live` by checking `sparkListenerBus.nonEmpty`.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,145 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {

Review comment:
       nit: `synchronized` is not needed.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,145 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,

Review comment:
       nit: `isActive = true`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,145 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {
+      queryToProgress.put(event.progress.runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(event.progress.runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length >= streamingProgressRetention) {
+      val uniqueId = progressIds.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
+    val querySummary = store.read(classOf[StreamingQueryData], event.runId)
+    store.write(new StreamingQueryData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIds,
+      querySummary.startTimestamp,
+      false,

Review comment:
       nit: `isActive = false`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,145 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_ => true)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      Array.empty[String],
+      startTimestamp,
+      true,
+      None
+    ))
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.contains(event.progress.runId)) {

Review comment:
       nit: you can use `putIfAbsent`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735799932


   **[Test build #131981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131981/testReport)** for PR 28781 at commit [`9531ac9`](https://github.com/apache/spark/commit/9531ac99292b7f7bb829eabcbacae41e2b2f3cce).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642455948


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123805/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736341203


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734786670






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713359532


   **[Test build #130079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130079/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642363026






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735761635






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r533222079



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryStatusStore.scala
##########
@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import java.util.UUID
+
+import org.apache.spark.sql.streaming.ui.{StreamingQueryData, StreamingQueryProgressWrapper, StreamingQueryUIData}
+import org.apache.spark.status.KVUtils
+import org.apache.spark.util.kvstore.KVStore
+
+/**
+ * Provides a view of a KVStore with methods that make it easy to query Streaming Query state.
+ * There's no state kept in this class, so it's ok to have multiple instances of it in an
+ * application.
+ */
+class StreamingQueryStatusStore(store: KVStore) {
+
+  def allQueryUIData: Seq[StreamingQueryUIData] = {
+    val view = store.view(classOf[StreamingQueryData]).index("startTimestamp").first(0L)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true).map(makeUIData)
+  }
+
+  // visible for test
+  private[sql] def getQueryProgressData(runId: UUID): Seq[StreamingQueryProgressWrapper] = {
+    val view = store.view(classOf[StreamingQueryProgressWrapper])
+      .index("runId").first(runId.toString).last(runId.toString)
+    KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+  }
+
+  private def makeUIData(summary: StreamingQueryData): StreamingQueryUIData = {
+    val runId = summary.runId.toString
+    val view = store.view(classOf[StreamingQueryProgressWrapper])
+      .index("runId").first(runId).last(runId)
+    val recentProgress = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+      .map(_.progress).sortBy(_.timestamp).toArray
+    StreamingQueryUIData(summary, recentProgress)

Review comment:
       bugfix update: proper to get `StreamingQueryProgressWrapper`  from KVstore




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-736998496






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713357122


   **[Test build #130075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130075/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642465030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666048818






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713489023


   Merged build finished. Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444712389



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,141 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQuerySummary], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQuerySummary]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass, e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQuerySummary(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQuerySummary], runId)
+    val progressIds =
+      querySummary.progressIds ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))

Review comment:
       I am not very familiar with Streaming. But for the progress event in history server, is it possible to avoid some unnecessary update in kvstore?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-643133365


   **[Test build #123907 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123907/testReport)** for PR 28781 at commit [`38b9506`](https://github.com/apache/spark/commit/38b9506ff7ca47c48be7f6af83b83812bbcfd97c).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648781386






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r444673200



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val store = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(store)

Review comment:
       Hmm, you are right. `ui.store.sotre` is enough




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666649293






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666968287


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/126847/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686367918


   **[Test build #128246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/128246/testReport)** for PR 28781 at commit [`9709405`](https://github.com/apache/spark/commit/970940581369f1868d6f5ba13464bb80cb383ebb).
    * This patch **fails RAT tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-686367631






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666973707






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644219054






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r533373596



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListenerSuite.scala
##########
@@ -112,4 +115,104 @@ class StreamingQueryStatusListenerSuite extends StreamTest {
     assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.runId == runId0)
     assert(queryStore.allQueryUIData.filterNot(_.summary.isActive).head.summary.id == id)
   }
+
+  test("test small retained queries") {
+    val kvStore = new ElementTrackingStore(new InMemoryStore(), sparkConf)
+    val conf = spark.sparkContext.conf
+    conf.set(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES.key, "2")
+    val listener = new StreamingQueryStatusListener(conf, kvStore)
+    val queryStore = new StreamingQueryStatusStore(kvStore)
+
+    def addNewQuery(): (UUID, UUID) = {
+      val format = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'") // ISO8601
+      format.setTimeZone(getTimeZone("UTC"))
+      val id = UUID.randomUUID()
+      val runId = UUID.randomUUID()
+      val startEvent = new StreamingQueryListener.QueryStartedEvent(
+        id, runId, "test1", format.format(new Date(System.currentTimeMillis())))
+      listener.onQueryStarted(startEvent)
+      (id, runId)
+    }
+
+    val (id1, runId1) = addNewQuery()
+    val (id2, runId2) = addNewQuery()
+    val (id3, runId3) = addNewQuery()
+
+    assert(queryStore.allQueryUIData.count(!_.summary.isActive) == 0)
+
+    val terminateEvent1 = new StreamingQueryListener.QueryTerminatedEvent(id1, runId1, None)
+    listener.onQueryTerminated(terminateEvent1)
+    // sleep 100 mills to make sure clean work complete
+    Thread.sleep(100)

Review comment:
       The sleep will make the tests flaky. Try to change them to conditional wait like `eventually(timeout(10.seconds))`.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -20,102 +20,144 @@ package org.apache.spark.sql.streaming.ui
 import java.util.UUID
 import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
 import scala.collection.mutable
 
+import com.fasterxml.jackson.annotation.JsonIgnore
+
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
- * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  // Events from the same query run will never be processed concurrently, so it's safe to
+  // access `progressIds` without any protection.
+  private val queryToProgress = new ConcurrentHashMap[UUID, mutable.Queue[String]]()
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val view = store.view(classOf[StreamingQueryData]).index("active").first(false).last(false)
+    val inactiveQueries = KVUtils.viewToSeq(view, Int.MaxValue)(_ => true)
+    val numInactiveQueries = inactiveQueries.size
+    if (numInactiveQueries <= inactiveQueryStatusRetention) {
+      return
+    }
+    val toDelete = inactiveQueries.sortBy(_.endTimestamp.get)
+      .take(numInactiveQueries - inactiveQueryStatusRetention)
+    val runIds = toDelete.map { e =>
+      store.delete(e.getClass, e.runId)
+      e.runId.toString
+    }
+    // Delete wrappers in one pass, as deleting them for each summary is slow
+    store.removeAllByIndexValues(classOf[StreamingQueryProgressWrapper], "runId", runIds)
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    store.write(new StreamingQueryData(
+      event.name,
+      event.id,
+      event.runId,
+      isActive = true,
+      None,
+      startTimestamp
+    ), checkTriggers = true)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
-  }
-
-  override def onQueryTerminated(
-      event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    if (!queryToProgress.containsKey(runId)) {
+      queryToProgress.put(runId, mutable.Queue.empty[String])
+    }
+    val progressIds = queryToProgress.get(runId)
+    progressIds.enqueue(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIds.length > streamingProgressRetention) {

Review comment:
        code style nit:`while (`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-648643664


   **[Test build #124471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124471/testReport)** for PR 28781 at commit [`584813d`](https://github.com/apache/spark/commit/584813d32c43cb9206f4209143d007d346557828).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-734784088


   **[Test build #131883 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/131883/testReport)** for PR 28781 at commit [`3bf1c40`](https://github.com/apache/spark/commit/3bf1c40672f7d35a71cb72981d0b52f21dc6dbcd).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r438704385



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/StreamingQueryHistoryServerPlugin.scala
##########
@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.ui
+
+import org.apache.spark.SparkConf
+import org.apache.spark.internal.config.Status.ASYNC_TRACKING_ENABLED
+import org.apache.spark.scheduler.SparkListener
+import org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus
+import org.apache.spark.sql.streaming.ui.{StreamingQueryStatusListener, StreamingQueryTab}
+import org.apache.spark.status.{AppHistoryServerPlugin, ElementTrackingStore}
+import org.apache.spark.ui.SparkUI
+
+class StreamingQueryHistoryServerPlugin extends AppHistoryServerPlugin {
+
+  override def createListeners(conf: SparkConf, store: ElementTrackingStore): Seq[SparkListener] = {
+    val listenerBus = new StreamingQueryListenerBus(None)
+    listenerBus.addListener(new StreamingQueryStatusListener(conf, store))
+    Seq(listenerBus)
+  }
+
+  override def setupUI(ui: SparkUI): Unit = {
+    val replayConf = ui.conf.clone.clone().set(ASYNC_TRACKING_ENABLED, false)
+    val store = new ElementTrackingStore(ui.store.store, replayConf)
+    val streamingQueryStatusStore = new StreamingQueryStatusStore(store)

Review comment:
       But for StreamingQueryStatusStore, `ui.store.sotre` is enough, right?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] xuanyuanking commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

xuanyuanking commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644838996


   Also cc @cloud-fan @rednaxelafx @gengliangwang for taking a look.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735554962






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-735817110






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on a change in pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on a change in pull request #28781:
URL: https://github.com/apache/spark/pull/28781#discussion_r437986587



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryUIData], inactiveQueryStatusRetention) { count =>

Review comment:
       add a trigger to clean inactive query data in store according to `StaticSQLConf.STREAMING_UI_RETAINED_QUERIES`

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -35,12 +35,12 @@ import org.apache.spark.util.ListenerBus
  * and StreamingQueryManager. So this bus will dispatch events to registered listeners for only
  * those queries that were started in the associated SparkSession.
  */
-class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
+class StreamingQueryListenerBus(sparkListenerBus: Option[LiveListenerBus])

Review comment:
       Make `sparkListenerBus `optional. When loaded by in History Server, spark use the `ReplayListenerBus` instead of `LiveListenerBus`. In live ui, streaming query post the event into `StreamingQueryListenerBus`, and the `StreamingQueryListenerBus` will post these events into `LiveListenerBus`. `StreamingQueryListenerBus` also subscribes to `LiveListenerBus` events thus getting back the posted event in a different thread. In history ui, `StreamingQueryListenerBus` will subscribes to the `ReplayListenerBus`, and process those events through `onOtherEvent()` directly.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -110,7 +116,15 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
       listener: StreamingQueryListener,
       event: StreamingQueryListener.Event): Unit = {
     def shouldReport(runId: UUID): Boolean = {
-      activeQueryRunIds.synchronized { activeQueryRunIds.contains(runId) }
+      // When loaded by Spark History Server, we should process all event coming from replay
+      // listener bus.
+      if (sparkListenerBus.isEmpty) {

Review comment:
       In live ui, streaming post event into `StreamingQueryListenerBus`. `StreamingQueryListenerBus` will manage the query status. But in history ui, `StreamingQueryListenerBus` subscribes to the `ReplayListenerBus`, it should process all events.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingQueryListenerBus.scala
##########
@@ -95,7 +95,13 @@ class StreamingQueryListenerBus(sparkListenerBus: LiveListenerBus)
         // synchronously and the ones attached to LiveListenerBus asynchronously. Therefore,
         // we need to ignore QueryStartedEvent if this method is called within SparkListenerBus
         // thread
-        if (!LiveListenerBus.withinListenerThread.value || !e.isInstanceOf[QueryStartedEvent]) {
+        //
+        // When loaded by Spark History Server, we should process all event coming from replay
+        // listener bus.
+        if (sparkListenerBus.isEmpty) {

Review comment:
       Check the `sparkListenerBus` is defined or not to determine if process all event.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/streaming/ui/StreamingQueryStatusListener.scala
##########
@@ -18,104 +18,161 @@
 package org.apache.spark.sql.streaming.ui
 
 import java.util.UUID
-import java.util.concurrent.ConcurrentHashMap
 
-import scala.collection.JavaConverters._
-import scala.collection.mutable
+import scala.collection.immutable.Queue
+
+import com.fasterxml.jackson.annotation.JsonIgnore
 
 import org.apache.spark.SparkConf
 import org.apache.spark.sql.internal.StaticSQLConf
 import org.apache.spark.sql.streaming.{StreamingQueryListener, StreamingQueryProgress}
+import org.apache.spark.sql.streaming.ui.StreamingQueryProgressWrapper._
 import org.apache.spark.sql.streaming.ui.UIUtils.parseProgressTimestamp
+import org.apache.spark.status.{ElementTrackingStore, KVUtils}
+import org.apache.spark.status.KVUtils.KVIndexParam
+import org.apache.spark.util.kvstore.KVIndex
 
 /**
  * A customized StreamingQueryListener used in structured streaming UI, which contains all
  * UI data for both active and inactive query.
  * TODO: Add support for history server.
  */
-private[sql] class StreamingQueryStatusListener(conf: SparkConf) extends StreamingQueryListener {
-
-  /**
-   * We use runId as the key here instead of id in active query status map,
-   * because the runId is unique for every started query, even it its a restart.
-   */
-  private[ui] val activeQueryStatus = new ConcurrentHashMap[UUID, StreamingQueryUIData]()
-  private[ui] val inactiveQueryStatus = new mutable.Queue[StreamingQueryUIData]()
+private[sql] class StreamingQueryStatusListener(
+    conf: SparkConf,
+    store: ElementTrackingStore) extends StreamingQueryListener {
 
   private val streamingProgressRetention =
     conf.get(StaticSQLConf.STREAMING_UI_RETAINED_PROGRESS_UPDATES)
   private val inactiveQueryStatusRetention = conf.get(StaticSQLConf.STREAMING_UI_RETAINED_QUERIES)
 
+  store.addTrigger(classOf[StreamingQueryUIData], inactiveQueryStatusRetention) { count =>
+    cleanupInactiveQueries(count)
+  }
+
+  private def cleanupInactiveQueries(count: Long): Unit = {
+    val countToDelete = count - inactiveQueryStatusRetention
+    if (countToDelete <= 0) {
+      return
+    }
+
+    val view = store.view(classOf[StreamingQueryUIData]).index("startTimestamp").first(0L)
+    val toDelete = KVUtils.viewToSeq(view, countToDelete.toInt)(_.isActive == false)
+    toDelete.foreach { e =>
+      store.delete(e.getClass(), e.runId)
+      store.removeAllByIndexValues(
+        classOf[StreamingQueryProgressWrapper], "runId", e.runId.toString)
+    }
+  }
+
   override def onQueryStarted(event: StreamingQueryListener.QueryStartedEvent): Unit = {
     val startTimestamp = parseProgressTimestamp(event.timestamp)
-    activeQueryStatus.putIfAbsent(event.runId,
-      new StreamingQueryUIData(event.name, event.id, event.runId, startTimestamp))
+    val querySummary = new StreamingQueryUIData(
+      event.name,
+      event.id,
+      event.runId,
+      Queue.empty[String],
+      startTimestamp,
+      true,
+      None)
+    store.write(querySummary)
   }
 
   override def onQueryProgress(event: StreamingQueryListener.QueryProgressEvent): Unit = {
-    val batchTimestamp = parseProgressTimestamp(event.progress.timestamp)
-    val queryStatus = activeQueryStatus.getOrDefault(
-      event.progress.runId,
-      new StreamingQueryUIData(event.progress.name, event.progress.id, event.progress.runId,
-        batchTimestamp))
-    queryStatus.updateProcess(event.progress, streamingProgressRetention)
+    val runId = event.progress.runId
+    val batchId = event.progress.batchId
+    val timestamp = event.progress.timestamp
+    val querySummary = store.read(classOf[StreamingQueryUIData], runId)
+    val progressIdQueue =
+      querySummary.progressIdQueue ++ Seq(getUniqueId(runId, batchId, timestamp))
+    store.write(new StreamingQueryProgressWrapper(event.progress))
+    while(progressIdQueue.length >= streamingProgressRetention) {
+      val uniqueId = progressIdQueue.dequeue
+      store.delete(classOf[StreamingQueryProgressWrapper], uniqueId)
+    }
+    store.delete(classOf[StreamingQueryUIData], runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      progressIdQueue,
+      querySummary.startTimestamp,
+      querySummary.isActive,
+      querySummary.exception
+    ))
   }
 
   override def onQueryTerminated(
       event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    val queryStatus = activeQueryStatus.remove(event.runId)
-    if (queryStatus != null) {
-      queryStatus.queryTerminated(event)
-      inactiveQueryStatus += queryStatus
-      while (inactiveQueryStatus.length >= inactiveQueryStatusRetention) {
-        inactiveQueryStatus.dequeue()
-      }
-    }
-  }
-
-  def allQueryStatus: Seq[StreamingQueryUIData] = synchronized {
-    activeQueryStatus.values().asScala.toSeq ++ inactiveQueryStatus
+    val querySummary = store.read(classOf[StreamingQueryUIData], event.runId)
+    store.delete(classOf[StreamingQueryUIData], event.runId)
+    store.write(new StreamingQueryUIData(
+      querySummary.name,
+      querySummary.id,
+      querySummary.runId,
+      querySummary.progressIdQueue,
+      querySummary.startTimestamp,
+      false,
+      querySummary.exception
+    ))
   }
 }
 
 /**
  * This class contains all message related to UI display, each instance corresponds to a single
  * [[org.apache.spark.sql.streaming.StreamingQuery]].
  */
-private[ui] class StreamingQueryUIData(
+private[sql] class StreamingQueryUIData(
     val name: String,
     val id: UUID,
-    val runId: UUID,
-    val startTimestamp: Long) {
+    @KVIndexParam val runId: UUID,
+    val progressIdQueue: Queue[String],
+    val startTimestamp: Long,
+    val isActive: Boolean,
+    val exception: Option[String]) {
 
-  /** Holds the most recent query progress updates. */
-  private val progressBuffer = new mutable.Queue[StreamingQueryProgress]()
+  private var storeOption: Option[ElementTrackingStore] = None
 
-  private var _isActive = true
-  private var _exception: Option[String] = None
+  @JsonIgnore @KVIndex("startTimestamp")
+  private def startTimestampIndex: Long = startTimestamp
 
-  def isActive: Boolean = synchronized { _isActive }
-
-  def exception: Option[String] = synchronized { _exception }
-
-  def queryTerminated(event: StreamingQueryListener.QueryTerminatedEvent): Unit = synchronized {
-    _isActive = false
-    _exception = event.exception
+  def recentProgress: Array[StreamingQueryProgress] = {
+    storeOption.map { store => progressIdQueue.map { uniqueId =>
+      store.read(classOf[StreamingQueryProgressWrapper], uniqueId).progress
+    }.toArray }.getOrElse(Array.empty)
   }
 
-  def updateProcess(
-      newProgress: StreamingQueryProgress, retentionNum: Int): Unit = progressBuffer.synchronized {
-    progressBuffer += newProgress
-    while (progressBuffer.length >= retentionNum) {
-      progressBuffer.dequeue()
+  def lastProgress: StreamingQueryProgress = {
+    if (progressIdQueue.nonEmpty) {
+      storeOption.map(_.read(classOf[StreamingQueryProgressWrapper], progressIdQueue.last).progress)
+        .orNull
+    } else {
+      null
     }
   }
 
-  def recentProgress: Array[StreamingQueryProgress] = progressBuffer.synchronized {
-    progressBuffer.toArray
+  def setKVStore(store: ElementTrackingStore): StreamingQueryUIData = {
+    storeOption = Option(store)
+    this
   }
+}
 
-  def lastProgress: StreamingQueryProgress = progressBuffer.synchronized {
-    progressBuffer.lastOption.orNull
+private[sql] class StreamingQueryProgressWrapper(val progress: StreamingQueryProgress) {
+  @KVIndexParam("batchId") val batchId: Long = progress.batchId
+  @KVIndexParam("runId") val runId: String = progress.runId.toString
+
+  @JsonIgnore @KVIndex
+  def uniqueId: String = getUniqueId(progress.runId, progress.batchId, progress.timestamp)
+}
+
+private[sql] object StreamingQueryProgressWrapper {
+  /**
+   * Adding `timestamp` into unique id to support reporting `empty` query progress
+   * when no data comes.
+   */
+  def getUniqueId(
+      runId: UUID,
+      batchId: Long,
+      timestamp: String): String = {
+    s"${runId}_${batchId}_$timestamp"

Review comment:
       The unique id of streaming query process info in kvstore. It contains `runId`, `batchId` and `timestamp`. We use `timestamp` to distinguish the empty process report which has the same `batchId`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644041498






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642465030






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737112790


   **[Test build #132029 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132029/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-737025797


   **[Test build #132018 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/132018/testReport)** for PR 28781 at commit [`e2758d7`](https://github.com/apache/spark/commit/e2758d76a05a9793147b95d83e63118eee5f2d4f).
    * This patch **fails SparkR unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713388262


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/34684/
   Test FAILed.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-644217416


   **[Test build #124051 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124051/testReport)** for PR 28781 at commit [`f280157`](https://github.com/apache/spark/commit/f2801571c222f42b56c661a0693039482d8ad2fd).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-642360652


   **[Test build #123806 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123806/testReport)** for PR 28781 at commit [`6199c81`](https://github.com/apache/spark/commit/6199c8136fe1b732c02b1edb1bf8f73dba88abf5).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

AmplabJenkins commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666846461






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] uncleGen commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

uncleGen commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-666284455


   retest this please.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA commented on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713431407


   **[Test build #130079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130079/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28781: [SPARK-31953][SS] Add Spark Structured Streaming History Server Support

Posted by GitBox <gi...@apache.org>.

SparkQA removed a comment on pull request #28781:
URL: https://github.com/apache/spark/pull/28781#issuecomment-713359532


   **[Test build #130079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130079/testReport)** for PR 28781 at commit [`2023c62`](https://github.com/apache/spark/commit/2023c625923899678572df6a8c853180f8a604dc).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org