You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/11/16 19:45:22 UTC

[GitHub] [spark] yliou opened a new pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

yliou opened a new pull request #34622:
URL: https://github.com/apache/spark/pull/34622


   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
     8. If you want to add or modify an error type or message, please read the guideline first in
        'core/src/main/resources/error/README.md'.
   -->
   
   ### What changes were proposed in this pull request?
   Add explicit stageId to operator mapping in the Spark UI that is a more general version of https://issues.apache.org/jira/browse/SPARK-30209, where a stageId-> operator mapping is done with the following algorithm.
    1. Read SparkGraph to get every Node's name and respective AccumulatorIDs.
    2. Gets each stage's AccumulatorIDs.
    3. Maps Operators to stages by checking for non-zero intersection of Step 1 and 2's AccumulatorIDs.
    4. Connect SparkGraphNodes to respective StageIDs for rendering in SQL UI.
   As a result, some operators without max metrics values will also have stageIds in the UI. In some cases, there is no operator->StageID mapping made because no stageIds have accumulatorIds that are a part of the Operator's accumulatorIds. URL links at the top to go to the succeeded jobs and completed stages that were executed as a part of the selected query are also provided.
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   Makes for easier and quicker debugging and navigation.
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce _any_ user-facing change?
   Yes, `Succeeded Jobs:` and `Completed Stages:`listed at the top of the UI, along with `Stages:` in some of the operators.
   <img width="697" alt="Screen Shot 2021-11-16 at 11 35 51 AM" src="https://user-images.githubusercontent.com/16739760/142054791-8229d142-41cd-4706-a53e-7abb51e5901c.png">
   
   <!--
   Note that it means *any* user-facing change including all aspects such as the documentation fix.
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If possible, please also clarify if this is a user-facing change compared to the released Spark versions or within the unreleased branches such as master.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   Manually tested.
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   If benchmark tests were added, please run the benchmarks in GitHub Actions for the consistent environment, and the instructions could accord to: https://spark.apache.org/developer-tools.html#github-workflow-benchmarks.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970624555


   **[Test build #145290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145290/testReport)** for PR 34622 at commit [`3489292`](https://github.com/apache/spark/commit/3489292aa0b25c0933270ce805301b71b864026b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-982295998


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50214/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-982291693


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50214/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r824289886



##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
##########
@@ -550,6 +550,10 @@ class SQLAppStatusListenerSuite extends SharedSparkSession with JsonTestUtils
 
     assertJobs(statusStore.execution(0), completed = 0 to 1)
     assert(statusStore.execution(0).get.stages === (0 to 3).toSet)
+
+    // Check stage and attemptID are gathered correctly.
+    val stageAttempt = statusStore.getStageAttempt(executionId)

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-1063288915


   So a couple questions and concerns.
   
   1. I'm not sure having the list of Completed Stages at the top of the page helps and I'm concerned that could be a very long list.  You can simply click on the job or go to the stages page to get there.  You could also make the Stages link in the operator box clickable.  
   2. what does this list for exchanges where the exchange crosses 2 stages?  Similar hash aggregates.
   3. have you run this more than just a couple local jobs?  ie on large jobs or in production?
   
   I'll try to get this built and try it out locally 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970799668


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r824289668



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -53,6 +53,12 @@ case class SparkPlanGraph(
       case node => Seq(node)
     }
   }
+
+  def getAllIds: Seq[Long] = {
+    allNodes.map {

Review comment:
       Done.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -179,6 +185,9 @@ class SparkPlanGraphNode(
       // Note: whitespace between two "\n"s is to create an empty line between the name of
       // SparkPlan and metrics. If removing it, it won't display the empty line in UI.
       builder ++= "<br><br>"
+      if (!stagesGraph.getOrElse(id, List()).isEmpty) {

Review comment:
       Done.

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -179,6 +185,9 @@ class SparkPlanGraphNode(
       // Note: whitespace between two "\n"s is to create an empty line between the name of
       // SparkPlan and metrics. If removing it, it won't display the empty line in UI.
       builder ++= "<br><br>"
+      if (!stagesGraph.getOrElse(id, List()).isEmpty) {
+        builder ++= "Stages: " + stagesGraph.getOrElse(id, List()).mkString(",") + "\n"

Review comment:
       Done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r824289452



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala
##########
@@ -128,6 +136,10 @@ class ExecutionPage(parent: SQLTab) extends WebUIPage("execution") with Logging
   private def jobURL(request: HttpServletRequest, jobId: Long): String =
     "%s/jobs/job/?id=%s".format(UIUtils.prependBaseUri(request, parent.basePath), jobId)
 
+  private def stageURL(request: HttpServletRequest, stageId: Int, attemptId: Int): String =
+    "%s/stages/stage/?id=%s&attempt=%s".format(

Review comment:
       Made the change.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r824289601



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala
##########
@@ -79,8 +79,26 @@ class SQLAppStatusStore(
   def planGraph(executionId: Long): SparkPlanGraph = {
     store.read(classOf[SparkPlanGraphWrapper], executionId).toSparkPlanGraph()
   }
+
+  def getStageAttempt(executionId: Long): List[(Int, Int)] = {

Review comment:
       I renamed the method




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970927347






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970912862


   **[Test build #145290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145290/testReport)** for PR 34622 at commit [`3489292`](https://github.com/apache/spark/commit/3489292aa0b25c0933270ce805301b71b864026b).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class StageAttempt(`
     * `case class GraphNodeToStages(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-971142241


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145293/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou edited a comment on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou edited a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-1064680050


   Thanks for the feedback comments so far.
   
   1. I originally added the list of completed stages for convenience on the SQL UI. Do you think it's worth removing the list of Completed Stages?
   2. In this case, multiple stages will show up in the operator box. It would look like 
   <img width="252" alt="image" src="https://user-images.githubusercontent.com/16739760/157783822-5f312c73-a7cf-4b31-9817-b3538dbe074d.png">
   3. I've run this in production at Workday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970694797


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49760/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-971132484


   **[Test build #145293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145293/testReport)** for PR 34622 at commit [`5734754`](https://github.com/apache/spark/commit/573475491216d2d13680a300e8669b993ee73463).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds the following public classes _(experimental)_:
     * `case class StageAttempt(`
     * `case class GraphNodeToStages(`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] tgravescs commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
tgravescs commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-976749417


   yes, it would be nice to have the actual stagIds in the ui, I'll need to look closer at the logic though, which likely won't be til next week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on pull request #34622: SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-975808167


   cc @tgravescs would this feature be of interest?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-1058561177


   @tgravescs do you have time to take a quick look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-1064680050


   Thanks for the feedback comments so far.
   
   1. I originally added the list of completed stages for convenience on the SQL UI. I'll look into making the Stages link in the operator box clickable.
   2. In this case, multiple stages will show up in the operator box. It would look like 
   <img width="252" alt="image" src="https://user-images.githubusercontent.com/16739760/157783822-5f312c73-a7cf-4b31-9817-b3538dbe074d.png">
   3. I've run this in production at Workday.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970738101


   **[Test build #145293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145293/testReport)** for PR 34622 at commit [`5734754`](https://github.com/apache/spark/commit/573475491216d2d13680a300e8669b993ee73463).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970624555


   **[Test build #145290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145290/testReport)** for PR 34622 at commit [`3489292`](https://github.com/apache/spark/commit/3489292aa0b25c0933270ce805301b71b864026b).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970902882


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49763/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-982266377


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/50214/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-976018777


   cc @sarutak and @gengliangwang FYI


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970927348






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970738101


   **[Test build #145293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/145293/testReport)** for PR 34622 at commit [`5734754`](https://github.com/apache/spark/commit/573475491216d2d13680a300e8669b993ee73463).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970688361


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49760/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-982295998


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/50214/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970694797


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/49760/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-971142241


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/145293/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #34622: [WIP] SPARK-37340 Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #34622:
URL: https://github.com/apache/spark/pull/34622#issuecomment-970658135


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/49760/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] yliou commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
yliou commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r824269389



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
##########
@@ -138,6 +139,30 @@ class SQLAppStatusListener(
     }
   }
 
+  override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = {
+    if (!isSQLStage(stageCompleted.stageInfo.stageId)) {
+      return
+    }
+    val stageNum = stageCompleted.stageInfo.stageId
+    val attemptID = stageCompleted.stageInfo.attemptNumber()
+
+    // gets the executionID that finished the stage
+    val liveExecution = liveExecutions.values().asScala
+    val execID = liveExecution.filter(_.stages.contains(stageNum)).head.executionId

Review comment:
       Yes, I haven't seen a case that suggests otherwise.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] martin-g commented on a change in pull request #34622: [SPARK-37340][UI] Display StageIds in Operators for SQL UI

Posted by GitBox <gi...@apache.org>.
martin-g commented on a change in pull request #34622:
URL: https://github.com/apache/spark/pull/34622#discussion_r819314166



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListener.scala
##########
@@ -138,6 +139,30 @@ class SQLAppStatusListener(
     }
   }
 
+  override def onStageCompleted(stageCompleted: SparkListenerStageCompleted): Unit = {
+    if (!isSQLStage(stageCompleted.stageInfo.stageId)) {
+      return
+    }
+    val stageNum = stageCompleted.stageInfo.stageId
+    val attemptID = stageCompleted.stageInfo.attemptNumber()
+
+    // gets the executionID that finished the stage
+    val liveExecution = liveExecutions.values().asScala
+    val execID = liveExecution.filter(_.stages.contains(stageNum)).head.executionId

Review comment:
       is this filter always returning non-empty collection ?

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLAppStatusStore.scala
##########
@@ -79,8 +79,26 @@ class SQLAppStatusStore(
   def planGraph(executionId: Long): SparkPlanGraph = {
     store.read(classOf[SparkPlanGraphWrapper], executionId).toSparkPlanGraph()
   }
+
+  def getStageAttempt(executionId: Long): List[(Int, Int)] = {

Review comment:
       ```suggestion
     def getStageAttempts(executionId: Long): List[(Int, Int)] = {
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -179,6 +185,9 @@ class SparkPlanGraphNode(
       // Note: whitespace between two "\n"s is to create an empty line between the name of
       // SparkPlan and metrics. If removing it, it won't display the empty line in UI.
       builder ++= "<br><br>"
+      if (!stagesGraph.getOrElse(id, List()).isEmpty) {

Review comment:
       ```suggestion
         if (!stagesGraph.getOrElse(id, Nil).isEmpty) {
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -179,6 +185,9 @@ class SparkPlanGraphNode(
       // Note: whitespace between two "\n"s is to create an empty line between the name of
       // SparkPlan and metrics. If removing it, it won't display the empty line in UI.
       builder ++= "<br><br>"
+      if (!stagesGraph.getOrElse(id, List()).isEmpty) {
+        builder ++= "Stages: " + stagesGraph.getOrElse(id, List()).mkString(",") + "\n"

Review comment:
       ```suggestion
           builder ++= "Stages: " + stagesGraph.getOrElse(id, Nil).mkString(",") + "\n"
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/ExecutionPage.scala
##########
@@ -128,6 +136,10 @@ class ExecutionPage(parent: SQLTab) extends WebUIPage("execution") with Logging
   private def jobURL(request: HttpServletRequest, jobId: Long): String =
     "%s/jobs/job/?id=%s".format(UIUtils.prependBaseUri(request, parent.basePath), jobId)
 
+  private def stageURL(request: HttpServletRequest, stageId: Int, attemptId: Int): String =
+    "%s/stages/stage/?id=%s&attempt=%s".format(

Review comment:
       nit: since you use `String.format()` you can use `%d` for the Int parameters, but it doesn't really matter

##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/ui/SQLAppStatusListenerSuite.scala
##########
@@ -550,6 +550,10 @@ class SQLAppStatusListenerSuite extends SharedSparkSession with JsonTestUtils
 
     assertJobs(statusStore.execution(0), completed = 0 to 1)
     assert(statusStore.execution(0).get.stages === (0 to 3).toSet)
+
+    // Check stage and attemptID are gathered correctly.
+    val stageAttempt = statusStore.getStageAttempt(executionId)

Review comment:
       ```suggestion
       val stageAttempts = statusStore.getStageAttempt(executionId)
   ```

##########
File path: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SparkPlanGraph.scala
##########
@@ -53,6 +53,12 @@ case class SparkPlanGraph(
       case node => Seq(node)
     }
   }
+
+  def getAllIds: Seq[Long] = {
+    allNodes.map {

Review comment:
       ```suggestion
       allNodes.map(_.id)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org