You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2023/10/10 06:36:39 UTC

[spark] branch master updated: [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new c9c99222e82 [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs
c9c99222e82 is described below

commit c9c99222e828d556552694dfb48c75bf0703a2c4
Author: yorksity <yo...@outlook.com>
AuthorDate: Tue Oct 10 14:36:23 2023 +0800

    [SPARK-45205][SQL] CommandResultExec to override iterator methods to avoid triggering multiple jobs
    
    ### What changes were proposed in this pull request?
    
    After SPARK-35378 was changed, the execution of statements such as ‘show parititions test' became slower. The change point is that the execution process changes from ExecutedCommandEnec to CommandResultExec, but ExecutedCommandExec originally implemented the following method
    
    override def executeToIterator(): Iterator[InternalRow] = sideEffectResult.iterator
    
    CommandResultExec is not rewritten, so when the hasNext method is executed, a job process is created, resulting in increased time-consuming
    
    ### Why are the changes needed?
    
    Improve performance when show partitions/tables.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No
    
    ### How was this patch tested?
    
    Existing tests should cover this.
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #43270 from yorksity/SPARK-45205.
    
    Authored-by: yorksity <yo...@outlook.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../main/scala/org/apache/spark/sql/execution/CommandResultExec.scala   | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
index 5f38278d2dc..45e3e41ab05 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/CommandResultExec.scala
@@ -81,6 +81,8 @@ case class CommandResultExec(
     unsafeRows
   }
 
+  override def executeToIterator(): Iterator[InternalRow] = unsafeRows.iterator
+
   override def executeTake(limit: Int): Array[InternalRow] = {
     val taken = unsafeRows.take(limit)
     longMetric("numOutputRows").add(taken.size)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org