You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/29 08:58:22 UTC

[GitHub] [spark] wangshuo128 commented on a change in pull request #31968: [SPARK-34873][SQL] Avoid wrapped in withNewExecutionId twice when run SQL with side effects

wangshuo128 commented on a change in pull request #31968:
URL: https://github.com/apache/spark/pull/31968#discussion_r603123418



##########
File path: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala
##########
@@ -223,11 +224,18 @@ class Dataset[T] private[sql](
   @transient private[sql] val logicalPlan: LogicalPlan = {
     // For various commands (like DDL) and queries with side effects, we force query execution
     // to happen right away to let these side effects take place eagerly.
+    def eagerRun(plan: LogicalPlan): LogicalPlan = {
+      val relation =
+        LocalRelation(plan.output, withAction("command", queryExecution)(_.executeCollect()))

Review comment:
       Yes. 
   IIUC, For showing SQL queries and associated jobs in Spark UI, only tracking the SQL execution is enough. 
   But in general, I think it's better to track all the execution ids and let the event listeners decide the behavior for the different events for one SQL query.
   So another option may be that we still track and send all the SQL execution-related events to listeners, and handle the first event of a SQL query in UI listeners.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org