You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "beliefer (via GitHub)" <gi...@apache.org> on 2023/07/14 10:14:52 UTC

[GitHub] [spark] beliefer commented on a diff in pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

beliefer commented on code in PR #41860:
URL: https://github.com/apache/spark/pull/41860#discussion_r1263561129


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala:
##########
@@ -315,6 +320,10 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with PredicateHelper with J
       case join @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, _, _, left, right, hint) =>
         var newLeft = left
         var newRight = right
+        // Whether it is a shuffle join or not should be based on the actual left and
+        // right table. For some join like left outer join, it will be a shuffle join
+        // even if left side table size is smaller than broadcast threshold.
+        val isShuffleJoin = isProbablyShuffleJoin(left, right, hint, joinType)

Review Comment:
   I think we can pass joinType into `extractBeneficialFilterCreatePlan`, `isProbablyShuffleJoin` and `probablyHasShuffle`. Then we can avoid breaking the code structure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org