You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by "maheshk114 (via GitHub)" <gi...@apache.org> on 2023/07/17 03:45:27 UTC

[GitHub] [spark] maheshk114 commented on a diff in pull request #41860: [SPARK-44307][SQL] Add Bloom filter for left outer join even if the left side table is smaller than broadcast threshold.

maheshk114 commented on code in PR #41860:
URL: https://github.com/apache/spark/pull/41860#discussion_r1264813906


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/InjectRuntimeFilter.scala:
##########
@@ -315,6 +320,10 @@ object InjectRuntimeFilter extends Rule[LogicalPlan] with PredicateHelper with J
       case join @ ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, _, _, left, right, hint) =>
         var newLeft = left
         var newRight = right
+        // Whether it is a shuffle join or not should be based on the actual left and
+        // right table. For some join like left outer join, it will be a shuffle join
+        // even if left side table size is smaller than broadcast threshold.
+        val isShuffleJoin = isProbablyShuffleJoin(left, right, hint, joinType)

Review Comment:
   > I think we can pass joinType into `extractBeneficialFilterCreatePlan`, `isProbablyShuffleJoin` and `probablyHasShuffle`. Then we can avoid breaking the code structure.
   @beliefer 
   The decision if a join is shuffle or nor depends on the join type and also the size of table on left and right side of the join. In method extractBeneficialFilterCreatePlan, the joined tables are interchanged to check if any of the table can be used to build the bloom filter. So it is difficult to judge the left and right table in method extractBeneficialFilterCreatePlan. That's why the decision if the join is shuffle or not is done outside. Please let me know if it makes sense. 
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org