You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by gu...@apache.org on 2023/06/06 00:27:54 UTC

[spark] branch branch-3.4 updated: Revert "[SPARK-43911][SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array"

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.4 by this push:
     new 7b304035669 Revert "[SPARK-43911][SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array"
7b304035669 is described below

commit 7b3040356698f7215b0fe44992401e671448338b
Author: Hyukjin Kwon <gu...@apache.org>
AuthorDate: Tue Jun 6 09:27:35 2023 +0900

    Revert "[SPARK-43911][SQL] Use toSet to deduplicate the iterator data to prevent the creation of large Array"
    
    This reverts commit 93709918affba4846a30cbae8692a6a328b5a448.
---
 .../scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala    | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala
index 80f863515d4..22d042ccefb 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/SubqueryBroadcastExec.scala
@@ -93,7 +93,7 @@ case class SubqueryBroadcastExec(
         val rows = if (broadcastRelation.keyIsUnique) {
           keyIter.toArray[InternalRow]
         } else {
-          keyIter.toSet[InternalRow].toArray
+          keyIter.toArray[InternalRow].distinct
         }
         val beforeBuild = System.nanoTime()
         longMetric("collectTime") += (beforeBuild - beforeCollect) / 1000000


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org