You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/12/29 00:06:42 UTC

[GitHub] [spark] linhongliu-db opened a new pull request, #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

linhongliu-db opened a new pull request, #39268:
URL: https://github.com/apache/spark/pull/39268

### What changes were proposed in this pull request?
This PR proposes to group all sub-executions together in SQL UI if they belong to the same root execution.

We can have some follow-up improvements after this PR:
1. Add links to SQL page and Job page to indicate the root execution ID.
2. Better handling for the root execution missing case (e.g. eviction due to retaining limit). In this PR, the sub-executions will be displayed ungrouped.

This feature is controlled by conf `spark.ui.sql.group.sub.execution.enabled` and the default value is set to `true`

### Why are the changes needed?
better user experience.

In PR #39220, the CTAS query will trigger a sub-execution to perform the data insertion. But the current UI will display the two executions separately which may confuse the users.
In addition, this change should also help the structured streaming cases

### Does this PR introduce _any_ user-facing change?
TODO

### How was this patch tested?

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061928098


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.groupSubExecutionEnabled")

Review Comment:
   Maybe, the following is better because the following introduces no new config namespace.
   ```scala
   + spark.ui.sql.groupSubExecutionEnabled
   - spark.ui.groupSQLSubExecutionEnabled
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061939747


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -140,6 +166,7 @@ object SQLExecution {
     } finally {
       executionIdToQueryExecution.remove(executionId)
       sc.setLocalProperty(EXECUTION_ID_KEY, oldExecutionId)
+      unsetRootExecutionId(sc, executionId.toString)

Review Comment:
   If we need to define a new method, shall we define it to accept `Long` directly?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061933315


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   According to the logic, this method does not `set root execution id`. It would be better to revise the method name. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367707382

   The test failed at python linter. should be caused by some "connect" PRs.
   ```
   annotations failed mypy checks:
   python/pyspark/sql/connect/client.py:25: error: Skipping analyzing "grpc_status": module is installed, but missing library stubs or py.typed marker  [import]
   python/pyspark/sql/connect/client.py:25: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
   python/pyspark/sql/connect/client.py:30: error: Skipping analyzing "google.rpc": module is installed, but missing library stubs or py.typed marker  [import]
   Found 2 errors in 1 file (checked 381 source files)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1058716626


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")
+    .doc("Whether to group sub executions together in SQL UI when they belong to the same " +
+      "root execution")
+    .version("3.4.0")
+    .booleanConf
+    .createWithDefault(false)

Review Comment:
   shall we turn it on by default?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] zhengruifeng commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

zhengruifeng commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367719878

   @cloud-fan I can not repro this fail in my local env
   
   the latest mypy check in master also succeed.  https://github.com/apache/spark/actions/runs/3804744443/jobs/6472202104


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1058717124


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -26,40 +26,65 @@ import scala.xml.{Node, NodeSeq}
 
 import org.apache.spark.JobExecutionStatus
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.UI.UI_SQL_GROUP_SUB_EXECUTION_ENABLED
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils, WebUIPage}
 import org.apache.spark.util.Utils
 
 private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with Logging {
 
   private val sqlStore = parent.sqlStore
+  private val groupSubExecutionEnabled = parent.conf.get(UI_SQL_GROUP_SUB_EXECUTION_ENABLED)
 
   override def render(request: HttpServletRequest): Seq[Node] = {
     val currentTime = System.currentTimeMillis()
     val running = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val completed = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val failed = new mutable.ArrayBuffer[SQLExecutionUIData]()
+    val executionIdToSubExecutions =
+      new mutable.HashMap[Long, mutable.ArrayBuffer[SQLExecutionUIData]]()
 
     sqlStore.executionsList().foreach { e =>
-      if (e.errorMessage.isDefined) {
-        if (e.errorMessage.get.isEmpty) {
-          completed += e
+      def processExecution(e: SQLExecutionUIData): Unit = {
+        if (e.errorMessage.isDefined) {
+          if (e.errorMessage.get.isEmpty) {
+            completed += e
+          } else {
+            failed += e
+          }
+        } else if (e.completionTime.isEmpty) {
+          running += e
         } else {
-          failed += e
+          // When `completionTime` is present, it means the query execution is completed and
+          // `errorMessage` should be present as well. However, events generated by old versions of
+          // Spark do not have the `errorMessage` field. We have to check the status of this query
+          // execution's jobs.
+          val isFailed = e.jobs.exists { case (_, status) => status == JobExecutionStatus.FAILED }
+          if (isFailed) {
+            failed += e
+          } else {
+            completed += e
+          }
+        }
+      }
+      // group the sub execution only if the root execution will be displayed (i.e. not missing)
+      if (groupSubExecutionEnabled &&
+        e.executionId != e.rootExecutionId &&
+        executionIdToSubExecutions.contains(e.rootExecutionId)) {
+        executionIdToSubExecutions.get(e.rootExecutionId).foreach { executions =>

Review Comment:
   executionIdToSubExecutions(e.rootExecutionId) += e



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1065369984


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {
+    // The current execution is the root execution if the root execution ID is null
+    if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
+      sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId)
+    }
+  }
+
+  /**
+   * Unset the "root" SQL Execution Id once the "root" SQL execution completes.
+   */
+  private def unsetRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   after a second thought, this function wrapper is misleading and doesn't make things clear. So I inlined it to the main function. Thanks for the suggestion.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala:
##########
@@ -43,6 +43,8 @@ case class SparkListenerSQLAdaptiveSQLMetricUpdates(
 @DeveloperApi
 case class SparkListenerSQLExecutionStart(
     executionId: Long,
+    // if the execution is a root, then rootExecutionId == executionId
+    rootExecutionId: Long,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061141956


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details

Review Comment:
   done, changed to 
   ```
         <div>
           {
             executionData.map { executionId =>
               <a href={executionURL(executionId)}>[{executionId.toString}]</a>
             }
           }
         </div> ++ details
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061927117


##########
core/src/main/resources/org/apache/spark/ui/static/webui.css:
##########
@@ -187,6 +187,18 @@ pre {
   display: none;
 }
 
+.sub-execution-list {
+  font-size:0.9rem;

Review Comment:
   ```css
   - font-size:0.9rem;
   + font-size: 0.9rem;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1062032939


##########
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala:
##########
@@ -74,6 +74,7 @@ class SQLExecutionUIDataSerializer extends ProtobufSerDe {
 
     new SQLExecutionUIData(
       executionId = ui.getExecutionId,
+      rootExecutionId = ui.getExecutionId,

Review Comment:
   This should be ui.getRootExecutionId after updating the protobuf definition.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1370500688

   @dongjoon-hyun @ulysses-you @cloud-fan, I updated the PR to address all the comments. could you take a look one more time? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] wangyum commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by "wangyum (via GitHub)" <gi...@apache.org>.

wangyum commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1637437594

   @linhongliu-db It seems this patch makes CTAS missing the child info on UI: https://issues.apache.org/jira/browse/SPARK-44213


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367003716

   cc @cloud-fan @HeartSaVioR 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061140864


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {
+    // The current execution is the root execution if the root execution ID is null
+    if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
+      sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId)
+    }
+  }
+
+  /**
+   * Unset the "root" SQL Execution Id once the "root" SQL execution completes.
+   */
+  private def unsetRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   yes, it's only used once. I personally think that using a function can better explain the logic since it's not a no-brainer.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -26,40 +26,63 @@ import scala.xml.{Node, NodeSeq}
 
 import org.apache.spark.JobExecutionStatus
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.UI.UI_SQL_GROUP_SUB_EXECUTION_ENABLED
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils, WebUIPage}
 import org.apache.spark.util.Utils
 
 private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with Logging {
 
   private val sqlStore = parent.sqlStore
+  private val groupSubExecutionEnabled = parent.conf.get(UI_SQL_GROUP_SUB_EXECUTION_ENABLED)
 
   override def render(request: HttpServletRequest): Seq[Node] = {
     val currentTime = System.currentTimeMillis()
     val running = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val completed = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val failed = new mutable.ArrayBuffer[SQLExecutionUIData]()
+    val executionIdToSubExecutions =
+      new mutable.HashMap[Long, mutable.ArrayBuffer[SQLExecutionUIData]]()
 
     sqlStore.executionsList().foreach { e =>
-      if (e.errorMessage.isDefined) {
-        if (e.errorMessage.get.isEmpty) {
-          completed += e
+      def processExecution(e: SQLExecutionUIData): Unit = {
+        if (e.errorMessage.isDefined) {
+          if (e.errorMessage.get.isEmpty) {
+            completed += e
+          } else {
+            failed += e
+          }
+        } else if (e.completionTime.isEmpty) {
+          running += e
         } else {
-          failed += e
+          // When `completionTime` is present, it means the query execution is completed and
+          // `errorMessage` should be present as well. However, events generated by old versions of
+          // Spark do not have the `errorMessage` field. We have to check the status of this query

Review Comment:
   we have this one: https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/ui/AllExecutionsPageSuite.scala#L69



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061141617


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")

Review Comment:
   changed to `spark.ui.sql.groupSubExecutionEnabled` but I'm glad to take any other naming suggestions. :)



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -250,32 +283,39 @@ private[ui] class ExecutionPagedTable(
   override def goButtonFormPath: String =
     s"$parameterPath&$executionTag.sort=$encodedSortColumn&$executionTag.desc=$desc#$tableHeaderId"
 
-  override def headers: Seq[Node] = {
-    // Information for each header: title, sortable, tooltip
-    val executionHeadersAndCssClasses: Seq[(String, Boolean, Option[String])] =
-      Seq(
-        ("ID", true, None),
-        ("Description", true, None),
-        ("Submitted", true, None),
-        ("Duration", true, Some("Time from query submission to completion (or if still executing," +
-          "time since submission)"))) ++ {
-        if (showRunningJobs && showSucceededJobs && showFailedJobs) {
-          Seq(
-            ("Running Job IDs", true, None),
-            ("Succeeded Job IDs", true, None),
-            ("Failed Job IDs", true, None))
-        } else if (showSucceededJobs && showFailedJobs) {
-          Seq(
-            ("Succeeded Job IDs", true, None),
-            ("Failed Job IDs", true, None))
-        } else {
-          Seq(("Job IDs", true, None))
-        }
+  // Information for each header: title, sortable, tooltip
+  private val headerInfo: Seq[(String, Boolean, Option[String])] = {
+    Seq(
+      ("ID", true, None),
+      ("Description", true, None),
+      ("Submitted", true, None),
+      ("Duration", true, Some("Time from query submission to completion (or if still executing," +
+        "time since submission)"))) ++ {

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1377645285

   Thank you, @linhongliu-db and @cloud-fan .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1059122041


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")
+    .doc("Whether to group sub executions together in SQL UI when they belong to the same " +
+      "root execution")
+    .version("3.4.0")
+    .booleanConf
+    .createWithDefault(false)

Review Comment:
   yes, we should



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061142058


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details
+    }
+
+    val baseRow: Seq[Node] = {
+      <tr>
         <td>
-          {jobLinks(executionTableRow.runningJobData)}
+          {executionUIData.executionId.toString}
         </td>
-      }}
-      {if (showSucceededJobs) {
         <td>
-          {jobLinks(executionTableRow.completedJobData)}
+          {descriptionCell(executionUIData)}
         </td>
-      }}
-      {if (showFailedJobs) {
-        <td>
-          {jobLinks(executionTableRow.failedJobData)}
+        <td sorttable_customkey={submissionTime.toString}>
+          {UIUtils.formatDate(submissionTime)}
         </td>
-      }}
-    </tr>
+        <td sorttable_customkey={duration.toString}>
+          {UIUtils.formatDuration(duration)}
+        </td>
+        {if (showRunningJobs) {
+          <td>
+            {jobLinks(executionTableRow.runningJobData)}
+          </td>
+        }}
+        {if (showSucceededJobs) {
+          <td>
+            {jobLinks(executionTableRow.completedJobData)}
+          </td>
+        }}
+        {if (showFailedJobs) {
+          <td>
+            {jobLinks(executionTableRow.failedJobData)}
+          </td>
+        }}
+        {if (showSubExecutions) {
+          <td>
+            {executionLinks(executionTableRow.subExecutionData.map(_.executionUIData.executionId))}
+          </td>
+        }}
+      </tr>
+    }
+
+    val subRow: Seq[Node] = {if (executionTableRow.subExecutionData.nonEmpty) {
+      <tr></tr>
+        <tr class="sub-execution-list collapsed">

Review Comment:
   done



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details
+    }
+
+    val baseRow: Seq[Node] = {
+      <tr>
         <td>
-          {jobLinks(executionTableRow.runningJobData)}
+          {executionUIData.executionId.toString}
         </td>
-      }}
-      {if (showSucceededJobs) {
         <td>
-          {jobLinks(executionTableRow.completedJobData)}
+          {descriptionCell(executionUIData)}
         </td>
-      }}
-      {if (showFailedJobs) {
-        <td>
-          {jobLinks(executionTableRow.failedJobData)}
+        <td sorttable_customkey={submissionTime.toString}>
+          {UIUtils.formatDate(submissionTime)}
         </td>
-      }}
-    </tr>
+        <td sorttable_customkey={duration.toString}>
+          {UIUtils.formatDuration(duration)}
+        </td>
+        {if (showRunningJobs) {
+          <td>
+            {jobLinks(executionTableRow.runningJobData)}
+          </td>
+        }}
+        {if (showSucceededJobs) {
+          <td>
+            {jobLinks(executionTableRow.completedJobData)}
+          </td>
+        }}
+        {if (showFailedJobs) {
+          <td>
+            {jobLinks(executionTableRow.failedJobData)}
+          </td>
+        }}
+        {if (showSubExecutions) {
+          <td>
+            {executionLinks(executionTableRow.subExecutionData.map(_.executionUIData.executionId))}
+          </td>
+        }}
+      </tr>
+    }
+
+    val subRow: Seq[Node] = {if (executionTableRow.subExecutionData.nonEmpty) {
+      <tr></tr>
+        <tr class="sub-execution-list collapsed">
+          <td></td>
+          <td colspan={s"${headerInfo.length - 1}"}>
+            <table class="table table-bordered table-sm table-cell-width-limited">
+              <thead>
+                <tr>
+                  {headerInfo.dropRight(1).map(info => <th>{info._1}</th>)}
+                </tr>
+              </thead>
+              <tbody>
+                {
+                executionTableRow.subExecutionData.map { rowData =>

Review Comment:
   done by using the style 2



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061933315


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   According to the logic, this method does not `set root execution id` if we set it already. It would be better to revise the method name. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061928098


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.groupSubExecutionEnabled")

Review Comment:
   Maybe, the following is better because the following introduce no new config namespace.
   ```scala
   + spark.ui.sql.groupSubExecutionEnabled
   - spark.ui.groupSQLSubExecutionEnabled
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1376552025

   Could you update your PR, @linhongliu-db ? We have Apache Spark 3.4 Feature Freeze schedule.
   
   Also, cc @xinrong-meng as Apache Spark 3.4. release manager.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061140998


##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")
+    .doc("Whether to group sub executions together in SQL UI when they belong to the same " +
+      "root execution")
+    .version("3.4.0")
+    .booleanConf
+    .createWithDefault(false)

Review Comment:
   done



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -26,40 +26,65 @@ import scala.xml.{Node, NodeSeq}
 
 import org.apache.spark.JobExecutionStatus
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.UI.UI_SQL_GROUP_SUB_EXECUTION_ENABLED
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils, WebUIPage}
 import org.apache.spark.util.Utils
 
 private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with Logging {
 
   private val sqlStore = parent.sqlStore
+  private val groupSubExecutionEnabled = parent.conf.get(UI_SQL_GROUP_SUB_EXECUTION_ENABLED)
 
   override def render(request: HttpServletRequest): Seq[Node] = {
     val currentTime = System.currentTimeMillis()
     val running = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val completed = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val failed = new mutable.ArrayBuffer[SQLExecutionUIData]()
+    val executionIdToSubExecutions =
+      new mutable.HashMap[Long, mutable.ArrayBuffer[SQLExecutionUIData]]()
 
     sqlStore.executionsList().foreach { e =>
-      if (e.errorMessage.isDefined) {
-        if (e.errorMessage.get.isEmpty) {
-          completed += e
+      def processExecution(e: SQLExecutionUIData): Unit = {
+        if (e.errorMessage.isDefined) {
+          if (e.errorMessage.get.isEmpty) {
+            completed += e
+          } else {
+            failed += e
+          }
+        } else if (e.completionTime.isEmpty) {
+          running += e
         } else {
-          failed += e
+          // When `completionTime` is present, it means the query execution is completed and
+          // `errorMessage` should be present as well. However, events generated by old versions of
+          // Spark do not have the `errorMessage` field. We have to check the status of this query
+          // execution's jobs.
+          val isFailed = e.jobs.exists { case (_, status) => status == JobExecutionStatus.FAILED }
+          if (isFailed) {
+            failed += e
+          } else {
+            completed += e
+          }
+        }
+      }
+      // group the sub execution only if the root execution will be displayed (i.e. not missing)
+      if (groupSubExecutionEnabled &&
+        e.executionId != e.rootExecutionId &&
+        executionIdToSubExecutions.contains(e.rootExecutionId)) {
+        executionIdToSubExecutions.get(e.rootExecutionId).foreach { executions =>

Review Comment:
   done



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -140,6 +166,7 @@ object SQLExecution {
     } finally {
       executionIdToQueryExecution.remove(executionId)
       sc.setLocalProperty(EXECUTION_ID_KEY, oldExecutionId)
+      unsetRootExecutionId(sc, oldExecutionId)

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367712743

   @zhengruifeng @HyukjinKwon are you aware of anything about this python failure?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367724579

   @linhongliu-db can you rebase your branch and try again?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1062032402


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala:
##########
@@ -43,6 +43,8 @@ case class SparkListenerSQLAdaptiveSQLMetricUpdates(
 @DeveloperApi
 case class SparkListenerSQLExecutionStart(
     executionId: Long,
+    // if the execution is a root, then rootExecutionId == executionId
+    rootExecutionId: Long,

Review Comment:
   We need to refactor the code change in https://github.com/apache/spark/blob/master/core/src/main/protobuf/org/apache/spark/status/protobuf/store_types.proto#L387



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1370440036

   working on the comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1061935560


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {
+    // The current execution is the root execution if the root execution ID is null
+    if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
+      sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId)
+    }
+  }
+
+  /**
+   * Unset the "root" SQL Execution Id once the "root" SQL execution completes.
+   */
+  private def unsetRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   This method is also misleading because we set `EXECUTION_ROOT_ID_KEY` to null only when it's equal to `executionId`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan closed pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution
URL: https://github.com/apache/spark/pull/39268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1065370092


##########
sql/core/src/main/scala/org/apache/spark/status/protobuf/sql/SQLExecutionUIDataSerializer.scala:
##########
@@ -74,6 +74,7 @@ class SQLExecutionUIDataSerializer extends ProtobufSerDe {
 
     new SQLExecutionUIData(
       executionId = ui.getExecutionId,
+      rootExecutionId = ui.getExecutionId,

Review Comment:
   done



##########
sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceSuite.scala:
##########
@@ -82,6 +82,7 @@ object SqlResourceSuite {
 
     new SQLExecutionUIData(
       executionId = 0,
+      rootExecutionId = 0,

Review Comment:
   done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1377675103

   Thank you everyone for reviewing this!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1377212933

   The failed `YarnClusterSuite` is definitely unrelated. I'm merging it to master, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] linhongliu-db commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

linhongliu-db commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1376639807

   @dongjoon-hyun working on it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

dongjoon-hyun commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1059565579


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -26,40 +26,63 @@ import scala.xml.{Node, NodeSeq}
 
 import org.apache.spark.JobExecutionStatus
 import org.apache.spark.internal.Logging
+import org.apache.spark.internal.config.UI.UI_SQL_GROUP_SUB_EXECUTION_ENABLED
 import org.apache.spark.sql.errors.QueryExecutionErrors
 import org.apache.spark.ui.{PagedDataSource, PagedTable, UIUtils, WebUIPage}
 import org.apache.spark.util.Utils
 
 private[ui] class AllExecutionsPage(parent: SQLTab) extends WebUIPage("") with Logging {
 
   private val sqlStore = parent.sqlStore
+  private val groupSubExecutionEnabled = parent.conf.get(UI_SQL_GROUP_SUB_EXECUTION_ENABLED)
 
   override def render(request: HttpServletRequest): Seq[Node] = {
     val currentTime = System.currentTimeMillis()
     val running = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val completed = new mutable.ArrayBuffer[SQLExecutionUIData]()
     val failed = new mutable.ArrayBuffer[SQLExecutionUIData]()
+    val executionIdToSubExecutions =
+      new mutable.HashMap[Long, mutable.ArrayBuffer[SQLExecutionUIData]]()
 
     sqlStore.executionsList().foreach { e =>
-      if (e.errorMessage.isDefined) {
-        if (e.errorMessage.get.isEmpty) {
-          completed += e
+      def processExecution(e: SQLExecutionUIData): Unit = {
+        if (e.errorMessage.isDefined) {
+          if (e.errorMessage.get.isEmpty) {
+            completed += e
+          } else {
+            failed += e
+          }
+        } else if (e.completionTime.isEmpty) {
+          running += e
         } else {
-          failed += e
+          // When `completionTime` is present, it means the query execution is completed and
+          // `errorMessage` should be present as well. However, events generated by old versions of
+          // Spark do not have the `errorMessage` field. We have to check the status of this query

Review Comment:
   Just a question. Do we have a test coverage for only Spark event logs to validate this code path?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details
+    }
+
+    val baseRow: Seq[Node] = {
+      <tr>
         <td>
-          {jobLinks(executionTableRow.runningJobData)}
+          {executionUIData.executionId.toString}
         </td>
-      }}
-      {if (showSucceededJobs) {
         <td>
-          {jobLinks(executionTableRow.completedJobData)}
+          {descriptionCell(executionUIData)}
         </td>
-      }}
-      {if (showFailedJobs) {
-        <td>
-          {jobLinks(executionTableRow.failedJobData)}
+        <td sorttable_customkey={submissionTime.toString}>
+          {UIUtils.formatDate(submissionTime)}
         </td>
-      }}
-    </tr>
+        <td sorttable_customkey={duration.toString}>
+          {UIUtils.formatDuration(duration)}
+        </td>
+        {if (showRunningJobs) {
+          <td>
+            {jobLinks(executionTableRow.runningJobData)}
+          </td>
+        }}
+        {if (showSucceededJobs) {
+          <td>
+            {jobLinks(executionTableRow.completedJobData)}
+          </td>
+        }}
+        {if (showFailedJobs) {
+          <td>
+            {jobLinks(executionTableRow.failedJobData)}
+          </td>
+        }}
+        {if (showSubExecutions) {
+          <td>
+            {executionLinks(executionTableRow.subExecutionData.map(_.executionUIData.executionId))}
+          </td>
+        }}
+      </tr>
+    }
+
+    val subRow: Seq[Node] = {if (executionTableRow.subExecutionData.nonEmpty) {
+      <tr></tr>
+        <tr class="sub-execution-list collapsed">
+          <td></td>
+          <td colspan={s"${headerInfo.length - 1}"}>
+            <table class="table table-bordered table-sm table-cell-width-limited">
+              <thead>
+                <tr>
+                  {headerInfo.dropRight(1).map(info => <th>{info._1}</th>)}
+                </tr>
+              </thead>
+              <tbody>
+                {
+                executionTableRow.subExecutionData.map { rowData =>

Review Comment:
   If you don't mind, shall we follow one of the previous indentation styles, please?
   **Style 1:**
   https://github.com/apache/spark/blob/6aac6428aae89915c5634b6a9659aff3d450f173/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala#L132-L140
   
   **Style 2:**
   https://github.com/apache/spark/blob/6aac6428aae89915c5634b6a9659aff3d450f173/sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala#L316-L320



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -250,32 +283,39 @@ private[ui] class ExecutionPagedTable(
   override def goButtonFormPath: String =
     s"$parameterPath&$executionTag.sort=$encodedSortColumn&$executionTag.desc=$desc#$tableHeaderId"
 
-  override def headers: Seq[Node] = {
-    // Information for each header: title, sortable, tooltip
-    val executionHeadersAndCssClasses: Seq[(String, Boolean, Option[String])] =
-      Seq(
-        ("ID", true, None),
-        ("Description", true, None),
-        ("Submitted", true, None),
-        ("Duration", true, Some("Time from query submission to completion (or if still executing," +
-          "time since submission)"))) ++ {
-        if (showRunningJobs && showSucceededJobs && showFailedJobs) {
-          Seq(
-            ("Running Job IDs", true, None),
-            ("Succeeded Job IDs", true, None),
-            ("Failed Job IDs", true, None))
-        } else if (showSucceededJobs && showFailedJobs) {
-          Seq(
-            ("Succeeded Job IDs", true, None),
-            ("Failed Job IDs", true, None))
-        } else {
-          Seq(("Job IDs", true, None))
-        }
+  // Information for each header: title, sortable, tooltip
+  private val headerInfo: Seq[(String, Boolean, Option[String])] = {
+    Seq(
+      ("ID", true, None),
+      ("Description", true, None),
+      ("Submitted", true, None),
+      ("Duration", true, Some("Time from query submission to completion (or if still executing," +
+        "time since submission)"))) ++ {

Review Comment:
   We need a space. `"time` -> `" time`.



##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")
+    .doc("Whether to group sub executions together in SQL UI when they belong to the same " +
+      "root execution")
+    .version("3.4.0")
+    .booleanConf
+    .createWithDefault(false)

Review Comment:
   It's okay to enable this by default. Please update the PR.



##########
core/src/main/scala/org/apache/spark/internal/config/UI.scala:
##########
@@ -229,4 +229,11 @@ private[spark] object UI {
     .stringConf
     .transform(_.toUpperCase(Locale.ROOT))
     .createWithDefault("LOCAL")
+
+  val UI_SQL_GROUP_SUB_EXECUTION_ENABLED = ConfigBuilder("spark.ui.sql.group.sub.execution.enabled")

Review Comment:
   This PR introduces 4 config namespace groups like the following. Shall we simplify the config namespace?
   ```
   spark.ui.sql.*
   spark.ui.sql.group.*
   spark.ui.sql.group.sub.*
   spark.ui.sql.group.sub.execution.*
   ```



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -55,6 +56,28 @@ object SQLExecution {
     }
   }
 
+  /**
+   * Track the "root" SQL Execution Id for nested/sub queries.
+   * For the root execution, rootExecutionId == executionId.
+   */
+  private def setRootExecutionId(sc: SparkContext, executionId: String): Unit = {
+    // The current execution is the root execution if the root execution ID is null
+    if (sc.getLocalProperty(EXECUTION_ROOT_ID_KEY) == null) {
+      sc.setLocalProperty(EXECUTION_ROOT_ID_KEY, executionId)
+    }
+  }
+
+  /**
+   * Unset the "root" SQL Execution Id once the "root" SQL execution completes.
+   */
+  private def unsetRootExecutionId(sc: SparkContext, executionId: String): Unit = {

Review Comment:
   Do we have any other usage for this methods, `setRootExecutionId` and `unsetRootExecutionId`? These methods seem to be used once.



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details
+    }
+
+    val baseRow: Seq[Node] = {
+      <tr>
         <td>
-          {jobLinks(executionTableRow.runningJobData)}
+          {executionUIData.executionId.toString}
         </td>
-      }}
-      {if (showSucceededJobs) {
         <td>
-          {jobLinks(executionTableRow.completedJobData)}
+          {descriptionCell(executionUIData)}
         </td>
-      }}
-      {if (showFailedJobs) {
-        <td>
-          {jobLinks(executionTableRow.failedJobData)}
+        <td sorttable_customkey={submissionTime.toString}>
+          {UIUtils.formatDate(submissionTime)}
         </td>
-      }}
-    </tr>
+        <td sorttable_customkey={duration.toString}>
+          {UIUtils.formatDuration(duration)}
+        </td>
+        {if (showRunningJobs) {
+          <td>
+            {jobLinks(executionTableRow.runningJobData)}
+          </td>
+        }}
+        {if (showSucceededJobs) {
+          <td>
+            {jobLinks(executionTableRow.completedJobData)}
+          </td>
+        }}
+        {if (showFailedJobs) {
+          <td>
+            {jobLinks(executionTableRow.failedJobData)}
+          </td>
+        }}
+        {if (showSubExecutions) {
+          <td>
+            {executionLinks(executionTableRow.subExecutionData.map(_.executionUIData.executionId))}
+          </td>
+        }}
+      </tr>
+    }
+
+    val subRow: Seq[Node] = {if (executionTableRow.subExecutionData.nonEmpty) {
+      <tr></tr>
+        <tr class="sub-execution-list collapsed">

Review Comment:
   This `<tr>` doesn't need additional indentation here. Could you align the indentation with the previous `<tr>` at line 389?



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/ui/AllExecutionsPage.scala:
##########
@@ -290,35 +330,112 @@ private[ui] class ExecutionPagedTable(
       }
     }
 
-    <tr>
-      <td>
-        {executionUIData.executionId.toString}
-      </td>
-      <td>
-        {descriptionCell(executionUIData)}
-      </td>
-      <td sorttable_customkey={submissionTime.toString}>
-        {UIUtils.formatDate(submissionTime)}
-      </td>
-      <td sorttable_customkey={duration.toString}>
-        {UIUtils.formatDuration(duration)}
-      </td>
-      {if (showRunningJobs) {
+    def executionLinks(executionData: Seq[Long]): Seq[Node] = {
+      val details = if (executionData.nonEmpty) {
+        val onClickScript = "this.parentNode.parentNode.nextElementSibling.nextElementSibling" +
+          ".classList.toggle('collapsed')"
+        <span onclick={onClickScript} class="expand-details">
+          +details
+        </span>
+      } else {
+        Nil
+      }
+
+      <div>{
+        executionData.map { executionId =>
+          <a href={executionURL(executionId)}>[{executionId.toString}]</a>
+        }
+        }</div> ++ details

Review Comment:
   indentation?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] ulysses-you commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

ulysses-you commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1058845420


##########
sql/core/src/main/scala/org/apache/spark/sql/execution/SQLExecution.scala:
##########
@@ -140,6 +166,7 @@ object SQLExecution {
     } finally {
       executionIdToQueryExecution.remove(executionId)
       sc.setLocalProperty(EXECUTION_ID_KEY, oldExecutionId)
+      unsetRootExecutionId(sc, oldExecutionId)

Review Comment:
   should it be `unsetRootExecutionId(sc, executionId)` ? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on PR #39268:
URL: https://github.com/apache/spark/pull/39268#issuecomment-1367058299

   also cc @ulysses-you 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] gengliangwang commented on a diff in pull request #39268: [SPARK-41752][SQL][UI] Group nested executions under the root execution

Posted by GitBox <gi...@apache.org>.

gengliangwang commented on code in PR #39268:
URL: https://github.com/apache/spark/pull/39268#discussion_r1062033642


##########
sql/core/src/test/scala/org/apache/spark/status/api/v1/sql/SqlResourceSuite.scala:
##########
@@ -82,6 +82,7 @@ object SqlResourceSuite {
 
     new SQLExecutionUIData(
       executionId = 0,
+      rootExecutionId = 0,

Review Comment:
   For testing purpose, let's use a different value from `executionId`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org